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PREFACE 


Th*  material  contained  in  thie  report  ie  a compilation  of  etudente'  papers 
coming  out  of  my  one  eemeeter  graduate  course  at  Saint  Louis  University. 

Each  of  the  papers  is  an  expansion  of  a class  lecture  or  topic.  For  various 
reasons,  not  all  of  the  papers  submitted  to  satisfy  course  requirements  have 
been  included. 

The  objective  of  the  course  was  to  develop  insight  into  ways  of  applying  \ 

sUtistics  to  solve  problems.  It  was  not  intended  to  be  a course  on  statistics. 

As  a result,  the  papers  are  not  a comprehensive  coverage  of  these  subjects.  I 

Furthermore,  they  are  not  the  last  word  on  the  subject.  They  are  not  author - 
atative  coverages  of  a subject,  but  more  a description  or  interpretation  of  the 
subject  by  the  student.  However,  they  have  been  reviewed  for  technical  con- 
tent. The  intention  is  to  develop  interest  on  the  part  of  the  reader  to  do 

further  investigating  into  statistics.  - 

There  was  some  effort  made  to  keep  the  papers  primarily  on  the  subject  ■ 

of  applied  meteorology.  However,  since  the  course  included  applications  of  j 

statistics  to  other  fields,  there  are  two  papers  which  cover  other  subjects.  i 

These  papers  cover  appUcatione  which  discuss  techniques  that  can  and,  in  j 

some  cases,  have  been  directly  applied  to  meteorology  elsewhere.  ' 


For  those  wishing  some  guidelines  for  more  basic  probability  and  statis- 
tical material,  I would  recommend,  in  addition  to  the  texts  referred  to  in  the 
> bibliography,  books  by:  Wadsworth  and  Bryan,  Hogg  and  Craig,  Dixon  and  j 

’ Massey.  An  appendix  includes  copies  of  computer  programs  for  doing  some  * 

I of  tho  analyses  discussed  in  the  report.  ; 


Timo  restrictione  have  made  it  impractical  to  properly  adit  the  enclosed 
papers  for  uniformity  of  content.  The  report  has  the  character  of  a compile 
tion  of  preprints  to  a proceeding  with  each  paper  reflecting  its  own  style  of 
presentatioa. 
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CHAPTER  I 


INTRODUCTION 


by  Maj  Bruce  D.  Altenhof 


1.  1 Introduction. 

This  publication  is  the  result  of  a St.  Louis  University  graduate -level  course  on  special  topics 
in  Statistical  Meteorology  taught  by  Adjunct  Professor,  Dr.  Robert  G.  Miller,  in  the  1977  spring 
semester.  Dr.  Miller,  a meteorological  statistician,  is  the  Chief  Scientist  for  the  Air  Weather 
Service.  The  objective  of  this  text,  like  that  of  the  course,  is  to  create  statistical  insight  into 
how  to  solve  problems. 

Two  textbooks  were  used  to  supplement  Dr.  Miller's  lecture  material.  The  books  were  "Multi- 
variate Analysis:  Techniques  for  Educational  and  Psychological  Research"  written  by  Ma.-rice  M. 
Tatsuoka  (1971)  and  "Statistics:  A Guide  to  the  Unknown"  edited  by  Judith  M.  Tanur  and  ly 
Frederick  Mosteller,  William  H.  Kruskal,  Richard  F.  Link,  Richard  S.  Pieters,  Gerald  R. 

Rising  (1972).  Tatsuoka's  book  is  based  on  courses  he  taught  in  advanced-statistics  and  multi- 
variate-analysis. It  provides  the  additional  information  required  by  students  to  get  a thorough 
understanding  of  advanced  statistical  methods.  Tanur's  book  explores  ways  in  which  statistics  can 
be  applied  to  a variety  of  problem  areas  in  society.  The  book  contains  44  nontechnical  examples 
of  applied  statistics  and  the  contributions  made  in  all  aspects  of  today's  society  (government, 
business,  science,  etc). 

In  an  effort  to  stimulate  student  interest  and  involvement,  a class  experiment  was  undertaken 
on  single-station  forecasting  and  is  included  as  a chapter. 

1.2  Statistical  Subjects. 

Numerous  statistical  subjects  were  explained  during  the  course.  The  main  subjects  in  alpha- 
betical order  were: 

(1)  Analysis  of  Variance 

(2)  Analysis  of  Covariance 

(3)  Bayes'  Theorem 

(4)  Canonical  Correlation 

(5)  Clustering 

(6)  Decision  Theory 

(7)  MarkoY  Processes 

(8)  Monte  Carlo 

(9)  Multiple  Discriminant  Analysis 

(10)  Multivariate  Normal  Distributions 

(11)  Nonparametric  Procedures 

(12)  Orthogonal  Polynomials 

(13)  Pattern  Recognition 

(14)  Regression  Analysis 

(15)  Significance  Testing 

(16)  Simulation  Techniques 

(17)  Stochastic  Processes 

(18)  Transformations 

The  Analysis  of  Variance  (A NOVA)  technique  is  a method  for  testing  hypotheses  concerning 
means  of  several  populations.  The  ANOVA  technique  was  illustrated  by  depicting  its  application 
to  testing  the  relative  merits  of  a new  YMCA  running  program  as  compared  to  the  program  it 
replaced.  Two  test  groups  were  analysed,  one  used  the  old  YMCA  program  and  the  other  used  the 
new  running  program.  The  analysis  tested  the  mean  values  of  the  dependent  variable  (time  to  run 
a certain  distance)  for  the  two  groups.  In  doing  so,  ANOVA  allowed  one  to  conclude  whether  there 
was  a difference  between  the  two  programs. 
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The  Btatistical  technique  known  as  the  analysis  of  covariance  is  an  extension  of  the  ANOVA  to 
take  into  account  the  possible  effects,  on  the  dependent  variable,  of  one  or  more  uncontrolled 
variables  (the  covariates).  In  the  above  test  of  the  YMCA  running  programs,  it  was  determined 
that  the  new  program  appeared  to  be  better  than  the  older  program  because  the  mean  of  the 
variable  considered  (time  to  run  a certain  distance)  was  much  lower  under  the  new  program. 

After  the  test  was  conducted,  it  was  determined  that  the  age  of  the  people  tested  was  not  uniform. 
Age  in  this  case  was  a covariate  and  may  have  had  an  effect  on  the  dependent  variable.  The 
analysis  of  covariance  was  then  applied  to  permit  an  adjustment  to  be  made--to  sharpen  the  test 
results.  The  analysis  of  covariance  uses  regression  equations  to  make  estimates  of  the  dependent 
variable  from  the  known  values  of  the  predictor  variable,  which  in  this  case  was  age. 


Bayes*  theorem  on  inverse  probability  or  poateHor  probability  was  depicted  by  the  following 
examples:  As  a weatherman,  suppose  you  have  to  make  a forecast  of  the  precipitation  conditions 
(rain,  snow,  or  neither)  at  the  inauguration  in  Washington  DC  one  day  in  advance.  All  you  have  is 
your  trusty  barometer.  For  years  you  have  recorded  the  pressure  on  each  Jan  19th,  9 AM  EST, 
and  later  took  notice  of  what  occurred  the  next  day  at  the  inauguration  site.  If  you  plotted  the 
relative  frequency  of  occurrence  of  the  precipitation  conditions  with  respect  to  pressure,  you 
might  get  what  is  depicted  in  Figure  1-1, 

FIGURE  1-1 


f(p) 


From  climatology,  you  know  that  the  probability  of  rain  (Py)  is  .20,  probability  of  snow  (P,)  is 
. 30,  and  the  probability  of  neither  occurring  (P^)  is  . 50.  Bayes*  Theorem  can  be  written  for 
this  situation  as: 


1 


P(Rain/Press)  = 


Pj.«  f(press/Rain) 


Py-  f(press/Rain)+P,*  f(press/Snow)+Pn*  f(prees/Neither) 


P . f(press/snow) 


P(Snow/ Press) 


P,-  f(pre8s/Rain)+P,"  f(press/Snow)+Pn-  f(press/Neither) 


I 


i 


. P_*  f(preBs/ Neither) 

P(Neither/ Press)  * " ! 

P,.  f(press/Rain)+P,*  f(press/Snow)+Pn-  f(press/Neither) 
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Now  to  dotormine  the  probabilities.  You  measure  the  pressure  on  19  Jan  at  9 AM  EST  and  it  is 
lOlOmb.  From  the  graph  you  determine  that  the  f's  for  1010  mb  are  as  follows: 


Then 


P(Rain/  lOlOmb) 


P(Snow/1010mb)  = (.  30)(.  50) 


P(Neither/1010mb)  = (.  50)(.  40)  = .20 


Your  forecast  based  on  the  use  of  Bayes'  Theorem  would  be  10%  chance  of  rain,  40%  chance  of 
snow,  and  50%  of  no  precipitation  for  the  inauguration. 


In  another  example,  suppose  someone  hands  you  a red  ball  and  asks  for  the  probabilities  that 
the  ball  was  taken  from  urn  1,  urn  2,  or  urn  3.  You  are  given  the  following  information: 


URN  1 


URN  3 


Contains 


Contains 


50  red  balls 
50  black  balls 


90  red  balls 
10  black  balls 


25  red  balls 
75  black  balls 


You  also  know  a pnori  that  the  probability  it  came  from  Urn  1 is  1/3,  from  Urn  2 is  1/3,  and  from 
Urn  3 is  1/3. 


By  applying  Bayes'  Theorem  and  using  the  above  information,  the  probabilities  are  easily 
determined  by 


um  X 


Therefore 


P(Urn  2/red) 


P(Urn  3/red) 


<V3H-25) = 

(l/3)(.  50)+(l/3)(.  90)+l/3)(.  25)  1.65 


The  statistical  method  of  Canonical  Correlation  was  created  by  an  economist  in  1935.  tt  is  a 
method  by  which  one  determines  a linear  combination  of  p predictors  and  a linear  combination  of 
q criterion  variables  such  that  the  correlation  between  these  linear  combinations  in  the  total  sample 
is  as  large  as  possible.  This  technique  has  been  used  successfully  in  the  insurance  business.  To 
illustrate,  let's  consider  the  following  example--Buyer  vs  Product.  An  insurance  company  will  look 
at  all  of  its  policies  sold  (Product)  and  lists  them  under  categories  such  as  policy  size,  policy  type, 
mode  of  payment,  etc.  The  company  then  lists  all  the  factors  known  about  the  buyers  of  the 
product  (e.  g.  , age,  sex,  income,  employment,  etc).  From  a complete  canonical  correlation 
analysis  of  all  variables,  the  company  can  determine,  from  the  factors  known  about  a potential 
customer,  which  type  of  policy,  amount  of  insurance,  and  payment  plan  the  customer  is  most 
likely  to  purchase.  Its  use  in  determining  what  policies  a customer  is  most  likely  to  buy  saves 
them  considerable  time,  money  and  manpower.  Another  possible  area  of  use  for  this  method  is  in 
meteorology.  One  should  be  able  to  infer  tomorrow's  set  of  weather  variables  from  today's  set  of 
weather  variables. 

Clustering  is  a method  for  identifying  significant  groups  such  as  would  be  desired  in  market 
research;  e.  g.  , What  are  the  main  markets  that  a company  is  operating  in  successfully?  Again, 
insurance  companies  have  found  this  information  extremely  valuable  for  sales  training,  designing 
new  products,  etc.  In  meteorology,  clustering  can  be  used  in  weather  typing. 

Decision  Theory  is  a technique  that  is  applied  when  there  is  uncertainty.  The  following 
example  will  provide  insight  into  this  technique.  Suppose  you  were  planning  the  next  day's  work 
for  a construction  team.  You  have  a choice  of  three  projects  you  can  plan  for;  however,  it 
depends  on  whether  it  rains  or  doesn't  rain  as  to  what  project  you  can  accomplish.  From  past 
experience,  you  know  what  losses  you  would  incur  if  the  weather  doesn't  agree  with  the  project 
you  planned  for.  By  putting  this  information  in  a decision  matrix  and  applying  decision  theory, 
you  can  minimize  the  potential  for  loss.  Consider  the  following  matrix. 


Units  of  Loss 

States 

of  Nature 

RAIN 

NO  RAIN 

Planning 

Action 

Al 

10 

0 

Az 

4 

4 

A3 



0 

10 

If  the  forecast  for  the  next  day  was  100%  for  rain,  you  would  plan  for  project  3 or  Aj.  If  the  fore- 
cast was  for  100%  no  rain,  then  you  would  plan  for  project  1 or  Aj.  However,  there  is  usually 
some  uncertainty  as  to  whether  it  will  rain  or  not.  For  instance,  there  may  be  a 50%  chance  of 
rain.  By  using  the  probability  of  rain  in  conjunction  with  the  matrix,  one  arrives  at  the  following: 


RAIN 

NO  RAIN 

Probability  y 

. 5 

_J 

Ai 

10 

0 

Az 

4 

4 

A3 

0 

1 

10 

EXPECTED  LOSSES 
= . 5‘  10  + . 5'  0 = 5 units  lost 

= . 5'  4 + , 5-  4 = 4 units  lost 
= . 5"  0 + . 5*  10  = 5 units  lost 
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Baser*,  on  this  decision  theory  technique,  you  would  plan  for  Project  A2  because  it  minimizes 
your  expected  loss.  Since  at  different  times  other  probabilities  for  rain  may  be  forecast  (i.  e,  , 
10%,  20%,  65%,  90%,  etc),  each  can  be  entered  into  your  calculations  to  minimize  the  potential 
loss  to  your  organization. 

The  "Markov  Process"  is  a statistical  approach  where  knowing  the  last  state  of  a process 
tells  you  all  you  need  to  know  to  predict  a future  state. 


A technique  often  used  in  statistics  is  the  "Monte  Carlo"  method.  The  Monte  Carlo  technique 
is  applied  to  stochastic  processes  which  are  based  on  a large  number  of  variable  factors.  An 
example  of  a process  which  is  affected  by  many  variables  is  the  time  it  takes  an  individual  to 
drive  home  after  work,  A few  of  the  variables  are:  time  of  day,  traffic  condition,  weather,  and 
the  individual's  attitude--each  of  which  has  a wide  range  of  variability.  By  keeping  records  over 
a long  period  of  time,  one  can  graph  the  number  of  occurrences  of  particular  times  it  takes  to 
get  home, 

i 


Using  the  technique  of  Monte  Carlo,  you  can  develop  a model  for  simulating  the  time  distribution 
in  place  of  collecting  the  information  from  actual  experience. 


Multiple  Discriminant  Analysis,  or  MDA,as  it  is  frequently  referred  t<^  involves  a more  compli- 
cated application  (more  variables)  of  Bayes'  Theorem,  In  this  method,  the  variables  are  weighted 
so  that  the  groups  separate  from  each  other  thus  allowing  the  probabilities  to  become  "sharper,  " 


The  bivariate  normal  distribution  is  an  extension  of  the  univariate  normal  distribution  to  the 
bivariate  situation.  For  a univariate  normal  distribution  of  a variable  X,  every  point  on  a line 
represents  a possible  value  of  X,  For  the  normal  bivariate  distribution  of  variable  X and 
variable  Y,  every  point  on  a plane  represents  a possible  pair  of  values.  Application  of  the 
bivariate  normal  distribution  has  been  made  in  hurricane  predictions. 


Most  statistical  methods  require  assumptions  about  the  distributions  underlying  a model, 
Non-parametric  methods,  also  called  "distribution-free"  statistical  methods,  are  used  for 
making  inferences  without  any  assumption  as  to  the  form  of  distribution  in  the  population. 

Statistical  methods  using  polynomials  have  been  developed  to  reduce  large  data  arrays  to 
ones  of  manageable  size.  It  is  possible,  because  of  redundancies  in  data,  to  reduce  the  number 
of  observations  needed  to  represent  a given  situation.  Harmonic  functions  and  empirical 
orthogonal  functions  are  used  to  accomplish  this  data  reduction  also.  Orthogonal  polynomials 
have  been  used  to  represent  weather  maps  with  only  a few  numbers. 


Classification  problems  have  long  interested  statisticians.  "Pattern  Recognition"  is  one 
approach  for  determining  which  of  several  groups  a particular  individual  "resembles"  the  most, 
in  terms  of  a specific  set  of  measurable  characteristics.  In  pattern  recognition,  one  has  at 
hand  a sample  from  each  of  K well-defined  populations.  Associated  with  each  individual  are 
measurements  on  P variables  that  are  deemed  to  be  important  in  differentiating  among  the 
several  populations  or  groups.  Now,  we  take  a new  individual  whose  group  membership  is 
unknown,  but  for  whom  we  can  take  measures  on  the  same  P variables.  Using  the  measures, 
one  can  classify  him  as  a member  of  one  of  these  K groups  according  to  which  he  shows 
greatest  resemblance, 

"Regression"  is  the  estimation,  or  prediction,  of  an  unknown  value  of  one  variable  from 
known  values  of  one  or  more  other  variables.  In  the  simplest  case,  one  variable  is  predicted  from 
a known  value  of  another  variable.  The  known  variable  is  commonly  called  the  independent 
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variable  and  the  unknown  variable,  the  one  to  be  estimated,  is  called  the  dependent  variable. 

The  relationship  between  the  dependent  variable  and  independent  variable  is  given  by  regression 
equa  ti  ons. 

Significance  testing  is  another  important  area  of  statistical  analysis.  It  arises  in  a special 
way  through  the  use  of  screening  procedures  which  try  to  find  the  best  predictor  variables. 

Simulations  have  wide  applications  in  statistics  and  various  techniques  will  be  discussed  in 
the  text.  Simulations  are  accomplished  through  modeling.  By  analyzing  past  data,  models  can 
be  developed  which  are  used  to  simulate  actual  observed  conditions.  Once  the  model  has  been 
developed,  one  can  input  into  the  model  various  parameters  and  determine  or  simulate  what  the 
future  (or  predicted)  effects  will  be  of  the  input  parameter  on  the  model.  The  Monte  Carlo 
method  is  a simulation  technique. 

Stochastic  processes,  as  stated  earlier,  are  processes  which  are  affected  by  a very  large 
number  of  variable  factors  in  a nondeterminfstic  (probabilistic)  manner. 

Another  statistical  subject  covered  in  the  course  was  that  of  transformations.  The  analysis 
of  data  can  be  made  efficient  if  data  is  made  more  compatible  with  the  underlying  models. 

All  of  the  above  statistical  subjects  will  be  discussed  in  more  detail  and  applied  in  various 
combinations  in  the  chapters  of  this  report. 

1.  3 Chapter  Contents. 

This  report  contains  10  chapters.  Each  represents  the  individual  student's  efforts  to  expand 
and  add  details  to  the  statistical  methods  presented  by  Dr.  Miller  during  the  class  sessions. 
Several  chapters  deal  with  a particular  operational  problem  solved  by  employing  statistical 
methods. 

Chapter  2 provides  the  preliminary  mathematics  for  the  study  of  statistical  methods.  It 
gives  an  explanation  on  the  presentation  of  data  bases  used  for  most  statistical  analysis  work. 
Matrices  are  illustrated  since  data  are  normally  presented  in  this  format.  Problems  with  data 
handling,  such  as  missing  or  erroneous  data,  are  discussed.  Two  methods  of  handling  these 
problems  are  shown.  They  are  the  prime  number  scheme  and  the  error  vectors  with  the  logical 
"OR"  statement.  The  concept  of  random  variables  is  applied  to  statistical  editing.  The  purpose 
and  history  of  transformations  are  highlighted,  followed  by  a discussion  on  the  transformation 
to  dummy  variables.  Detailed  instructions  are  provided  for  the  Crout  reduction  procedure,  the 
derivation  of  the  auxiliary  matrix  from  both  symmetrical  and  nonsymmetrical  matrices,  and  the 
calculation  of  the  inverse  matrix.  With  a working  knowledge  of  these  preliminaries,  one  should 
be  able  to  understand  the  basic  principles  which  make  up  the  various  methods  of  statistical 
analysis  discussed  In  the  report. 

In  chapter  3,  the  topic  of  screening  regression  is  covered.  The  screening  or  stepwise 
procedure  is  a method  of  selecting  significant  independent  variables  and  determining  a rank  order 
listing  of  significant  independent  variables  as  related  to  a dependent  variable.  This  technique 
has  application  in  meteorology  where  the  independent  variables  are  often  considered  as  predictors 
and  the  dependent  variable  is  the  predictand.  The  screening  process  allows  one  to  find  which  of 
a large  number  of  possible  predictors  are  most  significant  and  uses  these  predictors  in  the 
regression  equation  to  predict  future  meteorological  conditions.  Background  information  on  the 
origin  and  development  of  the  screening  process  is  highlighted.  The  details  of  the  screening 
procedure,  the  tests  of  significance,  and  8ev..ral  applications  are  thoroughly  covered. 

Chapter  4 discusses  the  application  of  multiple  discriminant  analysis  (MDA)  to  precipitation 
forecasting.  The  rationale  for  the  use  of  MDA  versus  regression  is  covered.  The  chapter  provides 
insight  into  graphical  interpretation,  mathematical  procedures,  selection  of  predictors,  and 
estimation  of  probabilities. 
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Chapter  S looks  at  the  use  of  Regression  Estination  of  Event  Probabilities  (KEEP)  as  a fore- 
casting technique.  The  REEP  prediction  technique  is  discussed  and  an  experieent  using  the 
technique  is  examined. 

Market  Research  is  covered  in  Chapter  6 with  an  in-depth  discussion  of  canonical  correlation 
and  its  use  in  determining  the  relationship  between  the  buyer  and  the  product.  This  type  of 
statistical  information  is  a valuable  management  tool. 

In  chapter  7,  a very  detailed  and  thorough  analysis  of  Markov  processes  and  its  use  in 
meteorology  are  presented.  The  chapter  provides  a review  of  matrix  manipulations  and  concepts. 
The  Markov  chain  is  defined.  A comparison  of  the  equivalent  Markov  model  developed  by  Dr. 
Miller  is  compared  to  the  classical  Markov  model  and  is  found  to  be  much  easier  to  develop  and 
apply  to  practical  forecasting  problems.  The  chapter  shows  that  the  simple  yet  powerful  pre- 
diction methods  of  the  Markov  type  can  successfully  be  applied  to  the  problem  of  forecasting  the 
weather  at  a station,  given  only  an  initial  observation  from  that  station.  This  is  known  as  single- 
station  forecasting. 

Chapter  8 considers  the  problem  of  nonlinearity  and  a method  for  dealing  with  it.  Boolean 
algebra  and  a property  of  lattice  theory  are  used  to  uncover  nonlinear  relationship. 

The  Delphi  technique  is  explained  in  chapter  9.  The  Delphi  technique  is  defined  and  its 
method  of  application  is  presented,  A detailed  discussion  of  the  Delphi  process,  as  it  was 
applied  to  a class  exercise,  is  provided  as  an  excellent  example  of  the  technique. 

In  chapter  10,  there  is  a write-up  on  the  class  experiment.  The  experiment  applied  a 
statistical  method  to  develop  probability  weather  forecasts  for  Rickenbacker  AFB.  The 
verification  on  independent  data  is  presented. 
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CHAPTER  2 


PRELIMINARY  MATHEMATICS 
by  CAPT  JISWY  N.  FULFORD 


DATA  NOTATION 

Most  statistical  analyses  work  from  data  bases.  This  data  is  normally  presented  in  a matrix 
format  with  the  variables  along  the  top  and  the  observations  along  the  side;  e.g.: 

A TYPICAL  DATA  ARRAY 

VARIABLES 


Observation 

Number 

1 

2 

3 


37 


Underlining  signifies  that  M is  a matrix. 


84 

17 

32 


10 

13 

16 


12 


The  variables,  (i  * 1, 


24 

4 

35 


19 


p),  can  be  used  for  items 


such  as  temperature,  pressure,  visibility,  etc.,  and  their  values  could  be  the  hourly  observations 
of  these  variables  at  a weather  station  in  raw  or  coded  form. 


PROBLEMS  WITH  DATA 


When  building  a matrix  of  data  there  are  certain  problems  associated  with  collecting  and  stor- 
ing it.  Raw  data  seldom  comes  in  a neat  form.  There  can  be  missing  observations,  questionable  or 
illogical  information  as  well  as  other  gross  errors.  Passing  through  the  data  by  eye  or  with  a 
computer  one  can  find  missing  or  gross  errors  if  the  amount  of  data  is  small  enough.  Corrections 
could  then  be  made  in  most  instances.  If  there  is  a large  volume  of  data,  the  following  two  methods 
could  be  used  to  keep  track  of  missing  data.  One  is  the  prime  number  scheme  and  the  other  uses 
error  vectors  with  the  logical  "OR"  statement.  Suppose  that  the  following  matrix  represented  p 
variables  such  as  temperature,  dewpoint,  pressure,  etc.,  with  N hourly  observations  (zeros  signify- 
ing missing  data) . 


VARIABLES 


Observation 

Number 

*1 

*2 

*3 

X4 

*5 

1 

~ n 

17 

4 

5 

17 

97~ 

2 

34 

0 

0 

13 

8 

34 

3 

0 

21 

2 

7 

0 

86 

N 

29 

17 

13 

4 

IS 

42 

Assign  a prime  number  to  each  variable  such  as 

Xj  X^  Xj  X^  Xj  Xg  . . . . Xp 

2 3 5 7 11  13  ....  N„ 

, . P 


1 


Create  a Prime  Vector  whose  elements  are  equal  to  the  product  of  the  prime  numbers  associated  with 
missing  data  in  a given  row  of  observations. 


PRIME 

VECTOR_^ 

Sj  = 0 

1 

>'1 

27 

=^2 

17 

’'3 

4 

*4 

5 

*5 

17 

97 

a^  =15 

2 

34 

0 

0 

13 

8 

34 

»3 

3 

0 

21 

2 

7 

0 

86 

a.  =13 

_i  _ 

N 

29 

17 

13 

4 

15 

42 

In  the  example  shown,  the  first  observation  row  has  no  missing  or  grossly  erroneous  data;  therefore, 
element  a.  of  the  prime  vector  is  equal  to  zero.  The  second  element,  a.2,  of  the  prime  vector  is 
equal  to  15.  This  is  the  product  of  prime  members  3 and  5;  therefore,  x^  and  x^  of  the  data  matrix 
are  missing  or  erroneous.  Element  aj  of  the  prime  vector  is  equal  to  22  or  the  product  of  prime 
numbers  2 and  11.  This  means  that  x,  jand  Xj  gof  the  data  matrix  are  missing  or  erroneous.  This 
computational  method  to  keep  track  of'missing’ data  is  easy  to  carry  along  as  an  extra  vector  for 
gross  editing  for  later  reference  for  building  a "good"  sample  of  required  variables. 


Another  method  uses  the  same  size  matrix  as  the  data  matrix  but  assigns  zeros  and  ones  for  good 
and  bad  data.  For  such  a situation  the  results  or  "bits"  can  be  packed  on  the  computer  word.  Zeros 
are  placed  in  the  elements  when  the  data  is  good  and  ones  are  used  when  the  data  is  missing  or  in- 
correct. Then  a resultant  vector  is  generated  using  a logical  "OR"  --  Example: 


r% 

0 

0 

0 


ej  and  62  are  the  error  vectors 
associated  with  variables  xj  and 
X2. 


The  symbol  for  logical  "OR"  is  6.  The  logic  is  as  follows: 


®1 

e 0 1 


0 

1 

1 

1 

i 


If  either  e.  or  e or  both  are  equal  to  ,/ne,  then  the  output  is  one.  The  output  is  zero  only  if  both 
ej  and  02  are  zero.  In  the  example  shown,  observations  1 and  2 of  xj  and  X2  are  good  but  observation 
3 cannot  be  used  if  variable  x^  is  required. 

These  two  methods  are  examples  of  gross  and  expedient  methods.  If  a thorough  edit  is  desired, 
then  it  should  be  performed  statistically.  In  order  to  accomplish  a statistical  edit,  it  is  neces- 
sary to  be  able  to  treat  each  variable  numerically;  that  is,  all  qualitative  variables  must  be  con- 
verted to  random  variables . To  do  this,  each  observed  condition  must  be  assigned  a <, umber.  NOTE: 

A Random  Variable  is  a variable  having  a specified  range  of  values  with  definite  probabilities  asso- 
ciated with  each.  An  example  would  be  present  weather  which  consists  of  qualitative  variables. 
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e.g.j  ^og,  snow,  rain,  thunderstom,  different  cloud  types,  etc.  For  ease  of  operation,  each  of 
these  transformations  can  be  given  the  values  0 and  l--referred  to  as  dummy  variables.  Quantitative 
variables  may  also  need  to  be  transformed;  i.e.,  changing  wind  from  degrees  and  speed  to  "u"  and  "v" 
component  s . 

TRANSFORMATIONS 


The  purpose  and  history  of  transformations  will  be  discussed  before  discussing  the  transforma- 
tion of  dummy  variables. 

Purpose  of  Transformations 

Analysis  of  data  will  proceed  easier  if  the  effects  are  additive,  and  the  variability  of  error 
is  symmetrical  and  near  normal.  The  purpose  of  transformations  is  to  approach  these  properties  as 
nearly  as  possible.  Normally  a transformation  which  improves  one  of  the  properties  also  improves 
the  other. 

History 

If  N items  are  each  drawn  independently  and  at  random  from  an  infinite  population  where  a pro- 
portion P of  the  items  have  a property  A,  then  the  proportions  P, ,P2,Pj,.  . . of  items  possessing  A 
in  the  successive  samples  will  be  distributed  in  such  a manner  that,  as  the  number  of  independent 
samples  increases  without  limit,  the  average  of  the  P's  will  approach  P and  the  mean  of  their 
squared  deviations  from  P will  approach  PX{1-P)/  N.  In  the  language  of  statistics  this  is  expressed 
by  stating  that,  if  a sample  of  N items  is  drawn  at  random  from  an  infinite  population  in  which  a 
proportion  P of  the  items  have  an  attribute  A,  and  if  the  proportion  of  items  possessing  A in  the 
sample  is  denoted  by  P,  then: 


Expected  value  of  P = E(p)  = P 

(2-1) 

Variance  of  P = V(p)  = P (1-P) 

N 

(2-2) 

Standard  Deviation  of  P = a (p)  = j 

P (1-P) 

(2-3) 

>1  N 

Equation  (2-3)  or  equation  (2-2),  considered  along  with  equation  (2-1),  states  that  observed  propor- 
tions P,  based  on  successive  independent  random  samples  of  size  N,  may  be  expected  to  be  grouped 
more  closely  about  the  true  proportion  P when  N is  large  than  when  N is  small . For  fixed  sample 
size,  their  sampling  variation  about  P will  be  greatest  when  P equals  1/2  and  will  decrease  toward 
zero  as  either  P or  1-P  approaches  zero. 

Dummy  Variables 

As  stated  earlier,  in  order  to  accomplish  a statistical  edit,  qualitative  variables  must  be 
replaced  as  numbers.  Dummy  variables  are  normally  designated  by  the  letter  Z.  Table  2-1  shows  how 
the  meteorological  variable  ceiling  is  transformed  into  five  dummy  variables  which  categorizes  ceil- 
ings into  the  height  intervals  shown. 


TABLE  2-1 


DBS  NO. 

CEILING 

DUfMY  VARIABLES 

Feet 

^1 

ilOO 

Z2 

200-400 

Z3 

500-900 

^4 

1000-2900 

Z5 

i3000 

1 

Unlimited 

0 

0 

0 

0 

1 

2 

10,000 

0 

0 

0 

0 

1 

3 

5,000 

0 

0 

0 

0 

1 

4 

2,000 

0 

0 

0 

1 

0 

5 

7,000 

0 

0 

0 

0 

1 

6 

0 

1 

0 

0 

0 

0 

7 

400 

0 

1 

0 

0 

0 

N 

Unlimited 

0 

0 

0 

0 

0 

As  indicated  in  Table  2-1,  ceiling  is  grouped  into  five  classes.  There  are  no  values  for  numbers 
such  as  99  or  499  in  these  limits  because  ceiling  is  measured  to  the  nearest  hundred  feet.  Whenever 
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f 

1 

\ 

1 


the  ceiling  is  100  feet,  as  in  observation  6,  then  duaay  variable  1 is  assigned  a value  of  one  and 
duiaay  variables  2,  3,  4,  and  S are  set  equal  to  0.  Whenever  the  ceiling  is  in  the  range  200-400  fse^ 
as  in  observation  7,  then  dumy  variable  2 takes  on  a value  of  1 and  duaay  variables  1,  3,  4,  and  5 
are  equal  to  0.  If  there  are  N observations  of  ceiling,  then  there  will  be  N observations  of  each 
of  the  five  dunny  variables. 

If  a continuous  variable  such  as  pressure  or  temperature  is  to  be  expressed  with  duamqr  vari- 
ables, then  it  is  necessary  to  separate  the  continuous  values  into  classes.  D.  R.  Cox(1957)  devised 
a method  for  dividing  a continuous  variable  into  K classes  so  that  the  grouping  error  is  minimized 
under  certain  conditions  for  a stated  K.  This  method  computes  K-1  limits  and  then  assigns  dummy 
variables  exactly  as  before.  The  limits  are  computed  as  follows:  Let  X,,  i • 1 , 2,  . . . , N,  be 
the  N values  of  the  variable  to  be  transformed  to  K duany  variables.  Calculate: 

N 

X - 1 L X. 

N i-1  * 


X-1  limits  are  then  obtained  by  using  these  values  to  enter  Table  2-2.  For  example,  if  three  dummy 
variables  are  desired  (K*3)  and  X ■ 1,  o ■ 2 then  the  two  limits  are  1 -.612  x 2 and  1 *.612  x 2 or 
-.224  and  *2.224.  Obviously,  there  is  some  loss  of  resolution  but  the  pay-off  is  a tremendous  in- 
crease in  speed  of  computation  if  all  variables  are  dummy  variables.  For  a variable  such  as  temper- 
ature, it  would  perhaps  take  six  divisions  {X*5)  to  include  the  necessary  resolution.  This  would 
transform  one  integer  Xj  into  six  dummy  variables  Zj,  i « l,---6- 


TABLC  2-2 

FACTORS  FOR  DCTERHINING  LIMITS 


STATISTICAL  ANALYSES 


To  do  ■ thorough  editing  or  to  do  nost  statistical  analysis,  the  following  quantities  are 
required : 


*1.1  ♦ *1.2  ♦ • • • ♦ h,ti  ■ E Xi,i 

i»l 


*2.1  ♦ *2,2  ♦ • • • ♦ X2,N  * *2,i 


Xp  1 ♦ Xp  5 ♦ • . . ♦ Xr,  u « r X, 


'‘P.l  ''P,2  • • 

SUM  OF  SQUARES 

*!.i  ♦ *i.2  ♦ • • 

over  all  p variables: 

SUM  OF  CROSS  PRODUCTS 


P.N  r /P.i 
i«l 


2 "2 

*1  N • ^ * 1 i 

l.N  1.1 


*1, 1*2,1  ♦ *1. 2*2.2  ♦ • • • ♦ *i,n*2.N  * [_j*l.i*2.i 

over  all  pairs  leading  to  the  eleaents  of  the  following  matrix: 

nr  , N N N ■ 


2 " 

N 

N 

r E 

0.1  i.i 

X X 
o.i*l.i 

^ *o  1*7  1 

i.i  O.i  2.1 

. . E X .X„  . 
i.i  o.x  P.I 

N 

N 

N 

‘o.i*l.i 

^ *1  1*2  i 

i.i  I'l 

■ ■ 

N 

N 

N 

'o.i’'2.i 

E Xj  ^X 

i.i 

2 i ^ * 2 i 

■ ■ ^ *2  i*P  i 

i.i 

} ’ • • • ^ 
In  n n n , i 

[^.1  o.i  F.i  I.i  p.i  2.1  P.i  p.i  J 

M —J 


Nhare  X.  . ■ 1,  1 ■ 1,  Thus  E X^  . ■ N. 

’ 1-1  ’ 

The  above  matrix  it  sufficient  to  perform  most  statistical  analyses.  A reasonable  sample  where  N - 
10,000  observations  and  P ■ 100  variables  would  require  over  50,000,000  multiplications  (taking 
account  of  symmetry  in  ly.  There  are  ways  to  avoid  this  large  amount  of  computation.  One  is  by  the  .*•  . 
use  of  dummy  variables  previously  shown  and  the  other  is  by  the  use  of  screening  methods  which  will  r’y.ik’' 
be  discussed  in  snother  section.  Now  we  will  examine  a method  of  editing  one  of  the  X^  variables 
where  i ■ 1,  P.  The  particular  X.  will  be  expressed  as  Y.  It  is  possible  to  estimate  the  value  of 
Y from  the  other  P-1  variables  sum  as  ' 


* • •«*«  ♦ *1*1  ♦ • • . ♦ *„X„ 

0 0 11  p p 
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linlMua.  Y is 


is  ^2 

where  Y is  oaitted  from  the  , i ■ 1 P variables,  such  that  Z (Y.  - Yj)  i 

i-1 

defined  as  Y estimate  and  X^  i 1.  Notice  that  with  the  inclusion  of  Xq,  the  above  equation  now  has 
P terms  since  the  particular  X^  « Y is  not  included.  In  order  to  determine  equation  (2-4),  it  will 
be  advantageous  to  employ  the  Crout  method. 

CROUT  METHOD 


The  Crout  method  is  a modification  of  the  Gauss  reduction.  It  is  well  suited  to  the  use  of  desk 
calculators  and  electronic  computers.  In  addition,  the  storage  of  auxiliary  data  is  reduced. 

The  following  data  are  used  to  derive  the  initial  matrix  I^.  Z's  instead  of  X's  are  used  since 
the  variables  are  assumed  to  be  in  dummy  form. 


Variables  Zj,  i«l 3. 


VARIABLES 


Thus, 


Observation 

Zo 

Zl 

Z2 

Z3 

- Y 

Number 

1 

1 

1 

0 

0 

2 

1 

1 

1 

1 

3 

1 

1 

0 

1 

4 

1 

0 

1 

0 

1 

S 

1 

1 

1 

1 

6 

1 

0 

0 

0 

7 

1 

0 

0 

0 

8 

1 

0 

1 

1 

9 

1 

0 

0 

0 

10 

1 

1 

0 

0 

— 

10  , 

10 

10 

Z Z , - 10 
i-l®'^ 

^?1 

.1^2, i 

• 2 

z z. 
i-1 

10 

10 

10 

10  2 

' ^o.i^l.i  • 

Z Zj  ^ . 5 

.i^* 

4 

i*l 

i«l 

i-1 

i-1 

10 

10  2 

10 

^ iZ,  . ■ 

2,i 

Z Z2  i - 4 

Z Z 

i-r 

.1^- 

3 

2.i^ 


0 . 


If  a set  of  linear  equations  were  being  solved,  then  0 would  represent  the  initial  matrix  in  the 
Crout  method.  This  will  simplify  the  cosqnitations  since  the  cross-product  matrix  is  symmetrical, 
and  is  a special  case  of  the  general  Crout  solution,  A second  example  of  the  complete  Crout  method 
will  be  shown  after  the  symmetrical  case. 

Auxiliary  Matrix 

The  first  step  is  to  derive  from  a given  initial  matrix  0,  an  auxiliary  matrix  A.  As  the  steps 
in  deriving  the  auxiliary  matrix  progress,  each  step  is  dependent  upon  the  preceding~tteps.  The 
auxiliary  matrix  is  evolved  in  a right -angle  pattern  branching  from  the  main  diagonal  i.e.,  the 
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10 

5 5 

4 2 4 
4 3 3 4 


diagonal  starts  at  the  upper  left-hand  corner  and  slopes  downward  to  the  right.  To  distinguish  be- 
tween the  usual  terais,  coluam  and  row,  which  denote  the  entire  vertical  and  horizontal  arrays,  we 
shall  use  tenis  for  partial  coluans  and  rows.  Therefore,  we  define  a vertical  block  (Synbol ; 

V,,  t ■ 1,2 n)  as  that  part  of  a colunn  and  the  elements  between  it.  A horizontal  block 

(Symbol:  t • l,2,...,n)  is  defined  as  that  part  of  a row  which  lies  to  the  right  uf  the  diagonal 

element.  These  definitions  are  illustrated  schematically  in  the  following  diagram. 


The  auxiliary  matrix  is  completed  in  successive  stages  beginning  with  the  first  vertical  block  Vj 
and  then  forming  the  first  horizontal  block  Hj,  next  V2  and  then  H2,  next  V3  and  then  H3.  and  con- 
tinuing in  a similar  fashion  throughout,  alternating  from  vertical  to  horizontal  and  then  to  the 
next  vertical  and  horizon 
following  rules: 
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A is  the  auxiliary  matrix  to  be 
determined  by  equations  (2-S) 
and  (2-6)  using  the  previously 
stated  order. 


In  column  V|,  ajj  • a^j  since  j -1  *0.  Using  equation  2-5 
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Row  Hj  uses  equation  (2-6) 


‘l'2  ■ • S/‘0  • .5 
•13  • •ii^'u  • • ■* 
•14 


a'j/aj'j  - 4/10  - .4 
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Using  equation  (2-5)  - Coluan  V2 

•22  ■ *22  ■ »21*12  • 5 - (5  X .5)  • 2.5 

•«  ■ “32  - •31*r2  • 2 - (4  X .5)  . 0 

*42  * *42  ' “4l“l2  • 3 - (4  X .5)  « 1 
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Using  equation  (2-6)  - Row  H2 
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Uaing  aquation 

(2-6)  - 

Row  Hj 
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It  can  be  shown 

The  solution  to 

N 

that  a^^  ■ 1.183  • £ 
i 

the  coefficients  of 

(Y.  - Y.)^. 

-1  ^ ^ 

the  estimate  equation  Y = Bq  + BjZj  + ^2^2  follows 

*2 

■ “34  * 

.58333 

Bl 

• “24  - 

“23*2 

• .4  - (0  X .58333)  • .4 

8o 

■ “1%  - 

“12®1 

■ “13*2 

8o 

« .4  - 

(.5  X 

.4)  - (.4  X .58333)  - -.0333 

The  above  procedure  for  calculating  coefficients  will  also  apply  for  continuous  variables. 

To  edit  the  duniy  variable  observations  examine  |Y  - Y|.  If  it  is,  say,  >.99  reject  a value  of  zero 
for  Y.  If  it  is,  say,  <.01  reject  a value  of  one  for  Y.  t 


Another  feature  of  the  auxiliary  matrix  that  can  be  used  for  a statistical  analysis,  if  the  variables 
are  continuous,  is  with  , ^ ^ , 

• E (Yj  - Yi)2  / N. 
i»l 

Determinant 

The  determinant  of  the  initial  matrix  is  equal  to  the  product  of  the  diagonal  elements  of  the 
auxiliary  matrix.  The  determinant  of  our  initial  matrix  is  therefore  equal  to  10  X 2.5  X 2.4  X 
1.183  • 70.98. 

Note:  Since  Y is  the  probability  that  Y » 1 you  may  use  the  following  procedure  to  eliminate 
values  of  Y > 1 or  < 0. 

If  Y < 0 set  Y - 0. 

If  Y > 1 set  Y • 1 . 

Non-Symnetrical  Matrix 

The  procedure  to  determine  the  auxiliary  matrix  for  a non-symmetrical  matrix  and  the  steps  for 
finding  the  inverse  will  now  be  shown.  Suppose  we  have  the  following  linear  equations: 
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as  follows; 
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The  augmented  matrix  is  as  follows: 
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Using  the  coefficient  matrix  as  our  initial  matrix,  we  then  use  the  following  equations  tb  determine 


the  auxiliary  matrix: 


j-1 

“ij  * *ij  - 


(i  - j) 


i-1 

i'  • 1 ajj  - E ajijajJj  (i  < 


j) 


(2-7) 

(2-8) 


Equation  (2-7)  is  identical  to  equation  (2-5)  used  previously.  Equation  (2-8)  is  different  since 
the  initial  matrix  is  non-symmetrical.  The  Tormat  and  order  of  determining  the  elements  of  A are 
identical  to  the  previous  case.  Therefore,  a detailed  explanation  is  not  given  but  by  using  equa- 
tions (2-7)  and  (2-8)  A is  as  follows: 
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Th*  next  step  is  to  determine  the  C column  which  will  then  be  used  to  determine  the  solution  column 
X whose  elements  are  the  required  values  of  x,  y,  z or  using  different  notation  xj,  X2,  X3.  The 
Tallowing  two  equations  are  used  to  determine  the  solution: 


i-1 

Ci  - ^ aiKCK 
K*1 


.i  * Ci  - aj^XK 

K*i  + 1 


Using  equation  C-9) 

cf  • Cj  / aji  » 8 / 2 • 4 


(2-9) 


(2-10) 


c'  • (Cj  - a2ici')/a22 
cf  • (7  -(Ix4))/2.S  - 1.2 


•=3  * ('3  - “Sl'^l  - *32‘=2')/“33 

Cj  • (5  - (1x4)  - (-  .5x1.2))/!. 6 • 1 
Using  equation  (2-10) 

X3  . Cj  • 1 

X2  • C2  - *£3X3  « 1.2  - (.2  X 1)  • 1 

*j  • Cj  - 3^2X2  - 813X3  » 4 - (1.5  X 1)  - (.5  X 1)  « 2 


Therefore  the  solution  to  the  original 


Inverse  Matrix  Calculation 


set  of  equations 


is 


The  procedure  for  determining  the  inverse  matrix  is  to  set  C|j  • 1 and  all  the  other  c values  ■ 
0 where  k • 1,...,3.  Then  the  xv  column  becomes  the  kth  column  of  the  inverse  matrix.  In  the  pre- 
vious example  the  c colianns  now  become 


Using  the  first  column  of  C 


0 0 
1 0 
0 1 


and  equations  (2-9)  and  (2-10): 


c{  . 1/2  . .5 

cJ  • (0  - (1  X .5))/2.S  - -.2 

Cj  - (0  - (1  X .5)  - (-.5  X -.2))/2.5  - -.375 
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Xj  . -.375 

X2  * -.2  -(.2  X -.375)  • -.125 
Xj  • .5  - (1.5  X -.125)  - (.5  X -.375)  « .875 
0 

Using  the  second  column  1 and  equations  (2-9)  and  (7-10)  yields  the  following: 


Cj  » 0 


Using  the  third  column 


Xj  - ,1« 

X2  = .375 
Xj  = -.625 

and  equations  (2-9)  and  (2-10)  yields  the  following: 


Cj  - 0 

c'  • 0 
Cj  - .625 

The  inverse  matrix  is  therefore: 


(.125) 


Xj  » -.125 


7 -5 

-1 

-1  3 

-1  . A-1 

-3  1 

5 

.625c2  - 

.12Sc3 

X2  * -.125cj  ♦ .375c2  - .12SC3 
X3  • '.375cj  ♦ .125c2  ♦ .625c3 

giving  xj  * 2,  X2  ■ 1.  and  X3  • 1. 

The  term  .125  or  1/8  was  conmion  to  all  elements  in  the  inverse  and  was  therefore  factored  out.  The 
determination  of  an  inverse  matrix  is  particularly  desirable  when  the  set  is  to  be  solved  for  many 
distinct  sets  of  right-hand  members.  The  inverse  matrix  is  also  useful  in  getting  the  standard 
error  of  the  regression  coefficients  (see  Snedecor_  1946). 
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1.  Introduction 

The  screening  (or  stepwise)  procedure  is  a means  of  selecting  significant  independent  variables 
and  determining  a rank  order  listing  of  these  variables  as  related  to  a dependent  variable.  In 
meteorology  these  independent  variables  (Xm,  m=  1,  . . . , M)  are  often  considered  as  predictors  and  the 
dependent  variable  (Y)  as  the  predictand.  Thus,  the  stepwise  procedure  will  consider,  in  turn,  all 
the  possible  predictors  by  running  tests  on  each  independent  variable  and  selecting  the  most  signif- 
icant variables.  The  procedure  orders  the  independent  variables  by  selecting  the  most  significant 
first,  the  next  most  significant  next,  and  so  forth.  The  order  is  not  necessarily  optimum.  The 
dependent  variables  are  then  used  in  the  form  of  a multiple  regression  equation  to  predict  the 
expected  value  of  a particular  dependent  variable  (Y) . 

The  number  of  variables  considered  to  forecast  phenomena  such  as  severe  storms,  pressure  pat- 
terns, hurricanes  and  other  meteorological  variables  is  often  considerable.  The  desired  approach 
is  to  find  which  of  the  possible  predictors  is  most  significant  and  use  them  in  the  regression 
equation  to  predict  future  conditions  of  (Y) . The  screening  process  allows  us  to  reach  this  goal. 


Other  reasons  for  using  the  screening  process  are  as  follows; 

1)  Meteorologists  usually  have  a "feel"  for  the  predictors  which  are  roost  significant,  and 
screening  regression  can  be  used  to  confirm  this  selection  of  variables  based  on  the  data. 

2)  Equations  with  fewer  variables  are  easier  to  understand  and  hence  more  likely  to  gain 
acceptance  and  to  be  used. 

3)  A subset  of  variables  can  provide  a better  prediction  equation  than  the  full  set,  even 
though  the  full  set  has  a higher  multiple  correlation  coefficient  (R) . The  primary  reason  for  this 
is  that  after  you've  considered  a number  of  variables,  any  increase  in  the  number  of  variables  used 
may  only  increase  the  amount  of  shrinkage  on  independent  data.  Thus,  a point  is  reached  where  the 
shrinkage  occurs  faster  than  R increases.  It  is  therefore  best  to  quit  screening  before  this  point 
is  reached. 

Prior  to  initiating  the  screening  process,  the  set  of  possible  independent  variables  should  be 
shown  to  relate  to  the  dependent  variable(s).  Experience  or  preliminary  investigations  can  deter- 
mine this.  This  is  necessary  because  the  use  of  regression  analysis  to  "find"  relationships,  where 
no  physical  facts  show  the  relationship  to  exist,  frequently  leads  to  less  than  desirable  results. 

Also  remember  that  the  final  equation  is  a result  of  the  data  base  used  to  develop  it.  Changes 
in  the  data  base  (e.g.,  additional  data  becomes  available,  new  predictors  are  found,  etcj  will  re- 
quire periodic  recomputation  of  the  regression  equation(s) . Similarly,  different  equations  will 


often  arise  when  considering  different  time  steps  (i.e.,  12,  24  or  48  hours  in  the  future),  indicat- 
ing that  some  predictors  are  stronger  than  others  for  these  various  time  steps. 

The  screening  technique  is  not  a new  idea;  it  has  its  origins  back  in  the  early  1940's  with  J.G. 
Bryan.  It  was  further  developed  in  the  mid  and  late  1950’s  by  M.  A.  Efroymson  and  R.  G.  Miller. 

More  recently,  in  the  mid  to  late  1960's  and  early  1970's,  it  has  been  used  in  meteorology 
by  the  Techniques  Development  Laboratory  (TDL)  with  M.  A.  Alaka,  et  al,  the  Air  Force  Global  Weather 
Central  (AFGWC)  with  R.  C.  Miller  and  others,  the  National  Severe  Storms  Forecast  Center  (NSSFC) 
with  C.  L.  David  and  others.  Many  other  groups  have  applied  the  procedure  to  a great  many  mete- 
orological elements.  Also,  applications  have  not  just  been  confined  to  the  field  of  meteorology; 
many  other  fields  have  also  used  screening  regression.  However,  these  other  applications  are  beyond 
the  scope  of  this  paper.  I am  sure  you  can  visualize  the  utilization  of  the  screening  techni  i 

such  fields  as  insurance,  the  stock  markets,  and  public  opinion  polling,  to  mention  only  a fe- 


Before  we  can  consider  the  screening  procedure  itself,  we  must  first  consider  some  of  the  basic 
concepts  such  as  simple  linear  regression  and  multiple  linear  regression.  The  next  few  pages  are 
devoted  to  these  preliminaries.  Following  that,  we  consider  the  screening  procedure  itself,  test  of 
significance,  and  applications. 

2.  Linear  Regression 

First,  consider  the  simple  linear  regression  equation  consisting  of  only  one  independent  variable 
(X)  and  one  dependent  variable  (Y) , where, 

Y = a + bX  (3-1) 

Here  the  hat  C")  represents  estimated  values. 

This  equation  represents  the  best  fitting  straight  line  for  a set  of  points  regressing  Y on  j(. 

Of  course,  a is  the  intercept  of  the  Y axis  and  b is  slope  of  the  line  described  by  the  equation. 

By  describing  this  line  as  the  line  of  best  fit  for  a set  of  data  points,  we  have  either  visually 
(for  simple  cases)  or  mathematical ly  attempted  to  minimize  the  deviations  of  the  points  from  the 
line.  Thus  the  resultant  linear  equation  gives  us  the  best  prediction  of  Y for  a given  value  of  X. 
Actually,  the  criterion  of  goodness  or  best  fit  that  is  employed  is  the  principle  of  least  squares. 
Here  the  best  fitting  line  is  that  one  which  minimizes  the  sum  of  squares  of  the  deviations  of  the 
observed  values  of  Y from  those  predicted.  Expressed  mathematically,  we  wish  to  minimize 

n 

SSE  = I (Yj  - Yi)2  (3-2) 

i=l 

where  SSE  has  the  common  name  Sum  of  Squares  for  Error,  and  Yj  is  the  estimated  value  determined 
from  the  linear  equation  and  Yj  is  an  observed  value  for  observation  i. 

For  this  simple  linear  equation,  the  estimated  value  of  b (the  slope)  for  a sampling  of  data 
points  can  be  found  by  solving  the  following  equation: 

(X^-X)  (Y.-Y)  n jjX.Y^JjXj) 

•>  “ ■ \ ' ^ ' (3-3) 

^ — 2 n "y  ^ n \ ^ 

iii  iihh) 

Then  a • Y - bX  where  the  bar  (X  and  Y)  refers  to  mean  values  of  observed  data.  Knowing  the  esti- 
mated values  of  a and  b,  we  can  now  determine  estimated  values  of  Y for  given  values  of  X. 

3.  Multiple  Regression 

The  multiple  regression  equation  quickly  becomes  complex  when  a large  number  M of  independent 
variables  (X-,  m • 1,  M)  are  considered.  Computations  considering  just  two  variables 

(one  independent  and  one  dependent)  can  easily  be  done  on  a desk  calculator  if  the  data  sampling 
isn't  too  large.  However,  when  considering  the  many  variables  which  may  be  used  in  multiple 
regression  problems,  the  number  of  computations  soon  becomes  overwhelming.  The  modem  digital 
computer  and  the  "canned"  multiple  regression  computer  programs  have  eliminated  the  severe 
restriction  on  the  number  of  variables  which  can  be  used. 
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A typical  aultiple  regression  equation  would  have  the  following  fora; 


Y • a ♦ bix]  ♦ b2X2  ♦ bjX3  ♦ . . . ♦ b^x,  ♦ . . . ♦ b^XM  (3-4) 


Here  M would  be  the  number  of  predictors  considered.  It  could  also  have  a fora  similar  to: 


a ♦ bjXj  ♦ b2X2  ♦ bjxj  ♦ b4X2  ♦ bsxix2 


(T-t,) 


This  is  still  considered  a linear  model,  since  the  term  linear  means  that  the  model  is  linear  with 
respect  to  the  coefficients  (bj).  The  regression  coefficients  bi  (j  • 1,2,3,...  m,...  M)  are  deter- 
mined using  the  method  of  least  squares. 

When  determining  coefficients  for  a multiple  regression  equation,  the  method  of  least  squares 
can  be  applied  in  several  ways.  For  exaaqile,  the  Grout  technique  could  be  used  with  duaab'  variables 
as  descrived  in  other  papers  in  this  report  (see  Chapters  2 and  7).  The  Gaussian  elimination 
technique  could  also  be  used. 


4.  Screening  Regression 

Textbooks  such  as  Draper  and  Smith  (1966)  discuss  several  screening  procedures,  for  example: 
(a)  all  possible  regressions,  (b)  backward  elimination,  (c)  forward  selection,  (d)  stepwise  re- 
gression and  others.  However,  descriptions  and  comparisons  of  the  various  methods  will  be  left 
to  the  textbooks.  Since  screening  Is  an  improved  version  of  the  forward-selection  procedure,  we 
will  consider  it  in  laore  detail.  Miller  (1962)  defines  this  procedure  as: 

"The  method  of  selecting  predictors  in  a 
forward  stepwise  manner  from  a set  of  possible 
predictors  where  the  criterion  for  selection 
is  the  partial  correlation  coefficient." 

The  screening  procedure  begins  by  selecting  the  individual  independent  variable  which  is  the  "best" 
predictor,  namely  as  the  predictor  that  maximiies  the  correlation  coefficient.  The  correlation 
coefficient  squared  is  the  proportion  of  the  variation  explained  by  the  predictor.  Next,  the 
screening  procedure  adds  the  variables  to  the  equation  sequentially,  in  order  of  importance. 


At  each  step,  the  variable  added  is  the  one  which  increases  the  explained  sum  of  squares  (and, 
hence,  R^)  or  equivalently  reduces  the  residual  sum  of  squares  by  the  largest  amount.  The  "best" 
set  of  variables  may  not  be  included  in  the  final  equation  as  a result  of  this  procedure.  However, 
the  procedure  provides  an  efficient  method  of  developing  "good"  regression  equations. 

Remember,  however,  that  when  you  select  and  test  single  predictors,  some  of  these  single  pre- 
dictors will  be  unselected,  while  together  they  might  contain  significant  information.  Therefore, 
you  may  want  to  make  significance  tests  on  combined  variables  as  well. 

However,  Miller  (1962)  states  that, 

"In  practice,  ...  it  has  been  found  that 
F*  (F  when  predictors  are  selected)  tends  to 
work  well  as  a significance  test  only  when 
variables  are  considered  singly.  This  may  be 
a consequence  of  the  fact  that  a bias  is  intro- 
duced in  the  regression  coefficients  as  a result 
of  selection." 

W«  are  now  ready  to  select  the  first  predictor  which  we  will  label  as  described  by  Miller 
(1958)  and  Miller  (1962).  First,  compute  the  total  sura  of  squares  of  deviations  from  the  mean  (SST) 

for  variates  Y and  X|||(e  • 1,  ...,  M)  and  the  total  sum  of  prc^ucts  of  deviations  from  the  means  (SPT) 

for  variates  Y and  X^(m  • 1,  ...,  M) . Mathematically  the  above  is  described  as: 

n 2 

SST  (Y)  . r (Yj  - Y)  , (5-7) 

1-1 


SST  (Xm)  • I (X«i  - Xb)2.  ■ - 1.  .,.,M 

i-1 

n _ 

SPT  (YX,)  - E (Yi  - Y)(X,i  - X,)  m - 1 M 

i-1 

3-3 


Note  that  (S-7)  and  (3-8)  are  related  to  the  variances  of  Y and  respectively;  i.e.,  Var(Y)  ■ 

SST  (Y)  and  Var(X^)  • SST(Xj,).  Also  (3-9)  is  similar  to  the  covariance  of  Y and  X„.  i.e., 
n 

Cov  (YX,)  « SPT(YX,).  Here,  as  before,  n represents  the  sample  size  considered  and  m represents  the 

^ 2 
predictor  beinji  considered.  In  Miller  (19S8)  the  multiple  correlation  coefficient  (R  ) is  described 

as : 

D 2 , Cov(YXa)  ^ C3-10) 

Var(Y)VAR(X„) 

Ihis  was  the  criterion  for  selection  used  in  that  paper.  However,  in  Miller  (1962)  the  equiva- 
lent criterion  for  selection  is  described. 


< - (3-11) 

Var(Y)Var(X  ) SSR(X  ) 

■ ■ 

Where  SSFiX^^)  is  the  fitted  sum  of  squares  for  Y on  predictor  X^  and  SSR(Xjj)  is  the  residual  sum  of 
squares  after  fitting  Y with  predictor  X„.  The  values  for  these  two  are  determined  by  the  following 
formulas : 


SSFfx  1 . SPT(YX,) 

2 

f 

(3-12) 

SST(Xb) 

SSR(X„)  • SST  (Y)  - 

SSF(X„). 

(3-13) 

Now  use  the  following  criterion  to  select  the  first  predictor  from  all  possible  (M)  predictors: 

for  all  m « 1,  . . . ,H 


SSF(xf'^)  > SSF(X„) 


SSR(x"^)  SSIUXJ 


(3-14) 


Since  this  criterion  is  a function  of  the  test  statistic,  the  usual  F ratio  cannot  be  used  to 
test  the  significance  of  X^*^.  Usually,  it  would  be  the  9S%  level  F expressed  as. 


.95 


(1  - 1_) 
20 


(3-lS) 


But  in  the  stepwise  or  screening  procedure,  the  95\  level  is 

'"‘.95  ' ’'(I  - I_).  (3-16) 

20P 

where  P is  the  number  of  predictors  M. 

Remember  that  each  time  a variable  is  selected  as  a significant  predictor,  the  value  of  P de- 
creases by  one  and,  thus,  the  probability  level  of  F*,gs  should  be  adjusted  accordingly. 

This  is  significant  when  a ssiall  number  of  predictors  are  considered  or  as  P approadies  one. 


Therefore,  the  predictor  x(^)  is  considered  significant  if 

^ '’*.95  n.n-2).  (3-17) 

where  the. numbers  1 and  n-2  within  the  parentheses  represent  the  degrees  of  freedom.  If,  by  this 
test,  X^  2 is  deemed  significant,  it  then  becomes  the  first  of  r oredictors  to  be  selected  from  the 
original  set  of  M possible  predictors.  Then  the  screening  technique  searches  for  the  next  signifi- 
cant predictor.  If,  however,  X'*^''  is  not  significant,  no  predictors  are  selected. 
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To  select  the  second  significant  predictor  xC2),  the  following  calculations  are  required: 

SPT(X(1)X„)  . r (Xid)  - xn),fx„i  . XJ,  (3 

i-1 

here  m * 1,  . . . ,M  but  X„  ? x(’J. 

The  criterion  for  choosing  X^^^  is 


SSF^(2)|x(1))  > sSF(X„|x(n)  for  all  m 

SSR(x(2)  |x(l)  SSRCXjX*'!))  ’‘m  ^ 


(3-19) 


Remember  that  the  line  between  the  variables  means  that  you  are  finding  SSF  of  X^^^  given  X^*^ 
Here , 


SSF  (X^IX^l^)  . 


SPT 


(YX(1^  fs 


-1 


SPT  (YX^)  J'|_spr  (xf*^x„)  SST(Xn) 

- SSF  (X(*5), 


SST(X(*))  SPT(X^*^X_) 


SPT(YX^ 

SPT{YX 


) 


and 


SSR  (X„|x(>))  = SST(Y)  - SSF(xn))  . SSF(Xn,|xn)) 

Values  for  SSF  (x(2)jxn);  and  SSR  (X^2)|;((l)^  ^ similar  fashion. 

This  second  predictor  is  judged  significant  if 

SSR(X(^J (X'*J) 


(3-20) 


(3-21) 


(3-22) 


Remember  that  the  mathematical  symbol  ( ] represents  the  transpose  of  the  matrix  or  vector  within 

brackets  and  [ represents  the  inverse  of  the  matrix  within  brackets. 


We  can  now  set  up  the  pattern  for  the  selection  of  all  other  significant  predictors  in  the 

general  form: 


where  m-1, 


SSF(x(s) |x(*) . . .x(®'*5) 
SSR(x{®J|x(^^.  .X^®'*^) 

...,M  but  X„,  )!  x(*)  ..  .x(®-n  , 

SSF(X„|xn)...x(s-n)  . 


> SSF(X„|X(>)...X(®-1)) 

ssRCX^jx^*-*. . .x(s-i)) 


SPT(Yx(13) 

SPT(Yx(s-J)) 
SPT(YX„)  I 


(3-23) 


(3-24) 
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and 


SSTCX^*^)  . . . SPT(x(*^xf®'^b  SPT(Xn)x„) 

SPT(xf*^X^®'*^)  . . . SST(X(s-l))  SPT(xf®'‘hm) 

SPT{X(*)X„)  . . . SPT(x(s-')Xn,)  SSTfX^,) 


-1 


SPT(YX(’)) 


Spt(YX^®'*^) 

SPT(YX„) 


- SSF(X(‘))  .....  SSF(x(®'*5 . . . X^®‘^^), 


SSR(X„|x(‘^  . . = SST(Y)  - SSF(X(1)).  SSF(x(2)  j^CU) 

-...-  SSF(X  ... 

m 

The  Predictor  is  significant  if: 

[n  - (S  ^ 1)1  ssF(x(s)|xn)  . . . x(^  - >))  fs  ♦ m 


(3-25) 


(3-26) 


Finally,  the  point  is  reached  where  the  rest  of  the  predictors  or  combinations  thereof  don't 
show  any  significance  and  the  screening  procedure  is  terminated. 

5.  Applications  of  Screening  Regression 

As  mentioned  earlier,  many  different  fields  have  made  extensive  use  of  the  screening  procedure. 
However,  since  we  are  primarily  interested  in  meteorological  applications,  we  will  mention  only  a 
few  of  these  meteorological  applications  in  this  section. 

The  purpose  of  this  section  is  not  to  fully  describe  a few  experiments  but  to  tell  you  what  the 
experiment  considered  and  direct  you  to  the  appropriate  material  if  you  desire  to  study  these  experi- 
ments in  more  detail . 

Three  of  the  earlier  experiments  are  described  in  studies  in  Statistical  Weather  Prediction 
(1958)  with  Thomas  F.  Malone  as  project  director.  R.  G.  Miller  was  the  author  of  the  first  experi- 
ment. Here,  the  desired  effect  was  to  determine  the  predictability  of  several  weather  elements,  24 
hours  in  advance  at  a number  of  stations.  Altogether,  seven  weather  variables  were  considered  at 
each  of  48  stations  in  the  U.  S.  Thus,  each  of  the  variables  at  all  48  stations  were  considered  in 
developing  prediction  equations  for  one  or  more  predictands  for  each  of  the  many  stations  tested. 

The  second  experiment  was  authored  by  X.  H.  Veigas,  R.  G.  Miller,  and  G.  M.  Howe.  This  experi- 
ment considered  the  "Probabilistic  Prediction  of  Hurricane  Movements  by  Synoptic  Climatology.” 
Selected  hurricanes  and  tropical  storms  that  occurred  between  1928  and  1953  were  used  as  the  develop- 
mental sample  from  which  the  prediction  equations  were  de''ived.  A total  of  447  storms  were  con- 
sidered. Ninety-five  variables  were  considered  for  each  of  the  storms-  Ninety-one  of 
these  variables  were  grid  pressure  values.  Two  of  the  four  remaining  variables  were  position  coordi- 
nates for  time  (t«0)  and  the  other  two  were  position  coordinates  24  hours  prior  to  prediction  time. 
The  screening  regression  procedure  then  determined  which  of  the  95  variables  were  the  strongest 
predictors.  This  experiment  concluded  that  "the  surface  pressure  pattern  does  contain  a useful 
amount  of  information  about  the  future  movement  of  tropical  cyclones  for  the  subsequent  twenty-four 
hour  period." 

The  third  experiment  was  authored  by  R.  G.  Miller  and  G.  M.  Howe  and  considered  the  "Statistical 
Prediction  of  the  SOO-mb  Pattern"  over  North  America  during  January  and  February  1957.  One  24-hour 
prediction  equation  was  derived  for  each  of  the  46  points  used  in  the  predictand  grid  by  the  screen- 
ing regression  technique.  This  experiwnt  concluded  that,  "The  predictions,  which  from  an  opera- 
tional point  of  view  are  easily  derived,  produced  results  with  errors  of  the  same  general  magnitude 
as  those  of  the  JNNP  barotropic  model  in  use  at  the  tisw  of  comparison.” 
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R.  C.  Miller  (1972),  while  working  at  AFGWC,  used  the  screening  procedure  in  a study  of  328 
tornado  cases  and  found  14  significant  parameters  for  forecasting  severe  storms.  These  parameters 
are  contained  in  his  Technical  Report  200  and  listed  in  rank  order. 

David  (1973)  used  screening  regression  to  develop  two  period  equations  providing  an  early  morning 
determination  of  the  severe  thunderstorm  potential  for  the  day.  He  used  Model  Output  Statistics  (M05 
and  screening  regression  to  study  the  occurrence  of  severe  thunderstorms  within  a radius  of  120  nm  of 
32  stations  through  the  central  and  eastern  United  States.  For  predictors  he  used  the  PE  model  fore- 
cast data  transmitted  via  teletype  as  FOUS  bulletins  and  he  used  the  observed  0600Z  surface  data. 

For  the  first  period  equation,  12- , 18- , and  24-hour  forecasts  from  the  PE  model  for  each  of  the 
following  predictors  were  screened;  mean  relative  humidity  for  3 layers,  6 hourly  quantitative  pre- 
cipitation totals,  vertical  velocity  at  700  mb,  the  lifted  index,  1000-500  mb  thickness,  u and  v 
components  of  the  mean  wind  of  the  boundary  layer,  mean  potential  temperature  and  mean  pressure  of 
the  boundary  layer.  For  the  second  period,  the  only  difference  was  that  the  24-,  30-,  and  36-hour 
forecasts  from  the  PE  model  were  used.  David  concluded  his  report  by  simply  stating  that  ".  . . 
predictors  from  the  PE  model  are  very  useful  in  forecasting  areas  of  expected  severe  thunderstorm." 

TDL,  under  M.  A.  Alaka,  et  al  (1973),  also  described  linear  regression  equations  they  developed 
in  their  initial  attempt  to  forecast  the  likelihood  of  thunderstorms  or  severe  weather  in  the  central, 
eastern,  and  southern  United  States.  Initially,  they  ran  three  experiments  to  develop  medium-range 
prediction  equations  (6-24  hours).  All  three  experiments  used  24-hour  forecasts  from  the  National 
Meteorological  Center  (NMC)  6-layer  Primitive  Equation  Model  and  from  the  TDL  Three-Dimensional  Tra- 
jectory Model  as  predictors.  One  hundred  and  three  predictors  were  used  in  the  initial  experiment 
consisting  of  dynamic  and  kinematic  parameters,  geopotential  height  and  thickness,  humidity,  sta- 
bility parameters,  temperatures,  winds,  and  miscellaneous  parameters. 

Initially  the  predictand  was  determined  from  nationally  transmitted  facsimile  radar  summary  maps 
and  finally  from  manually  digitized  radar  (MDR)  data  and  severe  reports. 

TDL  has  continued  improving  their  equations  over  the  years.  J.  P.  Charba  (1975)  described  TOL's 
short-range  equation  for  severe  local  storms  (2-6  hours  after  data  observation  time) . The  forecast 
probabilities  of  tornadoes,  hail,  and  damaging  wind  obtained  from  this  equation  are  tran.smitted  to 
the  NWS  three  tiroes  daily  via  teletype,  for  90  nm  x 135  nm  (predictand)  rectangular  areas.  Screen- 
ing regression  was  also  used  to  derive  this  equation.  The  predictors  are  derived  mainly  from  ob- 
served data  (not  MOS) . 


6.  Conclusion 

The  examples  mentioned  in  section  5 are  only  a few  of  a great  many  attempts  made  by  statisticians 
and  meteorologists  to  simplify  the  task  of  dealing  with  a seemingly  infinite  number  of  variables. 
While  the  attempts  mentioned  are  by  no  means  all  inclusive,  they  do  show  a variety  of  ways  in 
which  the  screening  procedure  can  be  applied  to  the  field  of  meteorology. 

The  screening  procedure  can  be  a useful  procedure  when  a large  number  of  predictors  are  con- 
sidered and  elimination  of  the  insignificant  ones  is  desired.  There  are,  of  course,  shortcomings 
such  as  the  equations'  dependency  on  the  sample  used  to  develop  it,  the  necessity  to  update  the 
equations  as  the  sampling  data  bases  change  or  time  steps  change,  the  fact  that  the  "best"  set  of 
variables  may  not  be  included  in  the  final  equation,  and  so  on.  However,  as  stated  earlier  in  this 
paper,  the  screening  procedure  can  help  meteorologists  confirm  their  notions  of  which  predictors  are 
most  significant  on  the  basis  of  theory. 
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Chapter  4 

MULTIPLE  DISCRIMINANT  ANALYSIS 


by  Captain  William  W.  Neubert 

1.  Rationale  for  Multiple  Discriminant  Analysis  vs  Regression. 

Impetus  for  the  use  of  the  multiple  discriminant  analysis  (MDA)  technique  in 
establishing  probabilities  arises  from  a need  to  classify  observed  phenomenon  into 
pre-established  groups.  MDA  is  well  suited  to  performing  this  classification  where 
the  groups  involved  have  no  particular  order  or  ranking. 

Within  a set  of  previously  observed  or  recorded  data,  we  might  be  interested  in 
learning  where  a new  observation  might  fit  into  this  group.  That  is,  where  does 
our  new  observation  lie  with  respect  to  the  mean  of  the  group?  Such  predictions 
could  best  be  handled  through  multiple  regression.  Suppose,  however,  that  we  have 
several  groups  into  which  to  classify  our  observations — no  one  group  having  any 
particular  rank  or  order  with  respect  to  any  other  group.  We  might,  for  example  be 
trying  to  generate  the  probability  for  the  occurrence  of  a particular  wind  direction 
or  a certain  type  of  precipitation.  In  these  two  examples  the  classes  into  which 
we  could  separate  the  observed  phenomena  bear  no  ranking  among  them;  i.e.,:  snow 
is  neither  better  or  worse  than  rain;  it's  just  different.  The  need  is  clear, 
however,  for  some  method  to  predict  into  which  of  the  unordered  classes  future 
observations  will  fall.  MDA  provicTes  one  way  to  derive  the  probability  fore- 
casts for  such  events.  Multiple  regression  techniques  would  be  more  appropriate 
if  we  were  dealing  with  ordered  or  scaled  data  such  as  temperature  or  wind  velocity. 
In  summary,  multiple  regression  helps  define  distinctions  within  groups;  whereas, 
discriminant  analysis  delineates  distinctions  between  groups. 

The  discriminant  function  eliminates  the  need  for  looking  at  the  measurements 
one  at  a time,  only  to  find  that  the  overlap  of  the  data  obscures  any  coiK:lusions 
we  might  have  been  able  to  draw  from  the  observations.  MDA  uses  a set  of  weighted 
coefficients  as  multiples  of  several  of  the  selected  variables  to  produce  a sum 
of  products  that  is  a single  discriminant  score.  This  score  makes  the  best  use 
of  all  the  information  contained  in  the  variables  we  have  selected  to  use.  Given 
the  groups  involved,  the  computation  develops  the  best  set  of  weights  possible  from 
the  measurements,  and,  in  effect,  sifts  out  the  important  differences  that  best 
separate  the  groups.  The  overlap  in  the  raw  data  that  had  acted  to  obscure  these 
differences  is  then  reduced  or  removed.  The  technique  is  improved,  or  limited  as 
the  case  may  be,  by  the  amount  of  information  contained  in  the  original  predictor 
variables  about  the  phenomena  we  are  trying  to  predict.  (Rulon,  1951,  pp  82-3). 

2.  Graphical  Interpretation. 

The  fundeunental  idea  of  the  discriminant  function  is  best  understood  by  viewing 
a graphical  depiction  of  the  results.  Suppose  two  variables,  Xj^  and  X2  say,  are 
considered  meteorologically  significant  in  the  forecasting  of  precipitation.  We 
wish  to  use  the  two  variables  to  predict  the  occurrence  of  rain,  or  snow,  or  no 
precipitation  at  all.  The  forecast  is  to  be  valid  six  hours  from  the  time  of  the 
observation  of  the  values  of  Xj^  and  X,.  Now,  assume  that  we  have  accumulated  a 
sample  of  three  years  of  data  on  the  two  variables.  We  also  know  the  type  of 
weather  that  occurred  six  hours  after  the  recording  of  the  corresponding  values 
of  Xj^  and  X^.  A graph  can  be  constructed  showing  the  values  of  Xj^  and  X2  and  the 
type  of  weather  corresponding  to  each  set  of  observations  (Figure  1) . The  weather 
would  be  coded  with  appropriate  symbology:  (*)«snow;  (*)“rain;  and  (o)”none. 

The  object  of  this  exercise  would  be  to  take  newly  observed  values  of  our 
predictors  and,  using  the  graph,  produce  a forecast  of  the  probability  of  rain  or 
snow  six  hours  hence.  It  is  clear,  however,  that  our  graph  of  the  raw  predictors 
has  produced  a great  deal  of  overlap  among  data  points,  and  that  such  a prediction 
is  all  but  impossible. 

Now  suppose  we  develop  a particular  computational  technique  that  uses  our 
accumulated  data  to  produce  two  new  values: 


Yl  - Vj^Xj^  + VjXj  and  Y2  - V^Xj^  + V^Xj. 
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NOTE;  Figures  1 and  2 are  exaggerated  examples  from 
ficticious  data,  and  are  shown  only  to  provide  clarity 
of  meaning. 


Figure  1.  Graph  of  Weather  Predictand  in  raw  predictor  space  . 


The  fxinctions  Y,  and  Y,  are  weighted  sums  of  the  original  predictor  variables  and, 
as  we  shall  see,  serve  as  a better  tool  for  prediction  than  the  raw  predictors 
alone.  If  %fe  have  carefully  selected  our  predictors  to  include  that  set  of  vari- 
ables that  contain  the  aiost  information  about  the  phenomena  we  are  trying  to  pre- 
dict, then  the  plot  of  Y.  and  Yj,  with  corresponding  weather,  might  look  something 
like  Figure  2. 

These  new  weighted  sum  functions,  called  discriminants,  provide  a great  deal 
■ore  separation  between  the  several  distinct  groups  in  our  precipitation  example. 
When  compared  to  the  plot  of  the  raw  predictor  data,  we  can  easily  see  how  the 
discrisdnants  "stretch*  the  data  apart,  reducing  or  eliminating  the  overlap.  For 
clarity,  these  t«#o  sample  figures  have  been  greatly  exaggerated,  and,  thus,  present 
an  oversimplified  version  of  the  results  of  a very  complicated  mathematical  pro- 
cess. (Tanur,  1972,  pp.  376-80). 
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Figure  2.  Graph  of  Weather  Predlctand  in  Discriminant  Space. 


3.  Mathematical  Procedure. 

In  line  with  the  simplified  example  discussed  above,  when  the  number  of  group 
classifications  exceeds  two,  the  discriminant  analysis  becomes  multi-dimensional. 

If  there  are  G of  these  groups,  then  there  are  G points  in  space  each  representing 
one  of  the  means  of  one  of  the  G groups.  If  the  number  of  predictor  varlediles  is 
at  least“G-l,  then  these  group  means  will  define  a discriminant  space  of  G-1 
dimensions  (Bryan,  1951,  p.  90). 

The  dimensions  or  directions  delineate  areas  of  major  differences  between  the 
groups.  In  principle  components  analysis  the  concept  of  dimensions  or  directions 
in  space  is  closely  related  to  the  algebraic  idea  of  linear  combinations.  To  study 
the  directions  of  group  differences,  then,  we  are  tas)ced  with  finding  linear  combi- 
nations of  the  original  predictor  variables  that  exhibit  large  differences  in  group 
means.  Multiple  discriminant  analysis  is  a technique  for  finding  those  combina- 
tions that  will  separate  the  group  means  of  the  predictor  variables  to  the  maximum 
degree  allowed  by  the  predictor  variables  we  have  chosen  (Tatsuo)ca,  1971,  p.  157). 

Assume  that  we  have  G mutually  exclusive  and  exhaustive  groups  into  which  we 
•te  to  classify  future  observations.  We  have  chosen  a group  of  predictors  that 
contain  Information  about  which  of  the  G groups  future  observations  will  fall  into. 
We  denote  these  predictors  by  X , numbering  from  p“l  ...  to  a total  of  P variables. 
Assume  we  have  a dependent  sample  of  data  compiled  from  observations  of  our  X 
predictors,  and  let  the  mumber  of  observations  in  each  group  of  this  sample  ^ 
be  n^Jg-l,. . . /;) , so  that 
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■ tctal  d«p*nd«nt  auipl*  ■it*. 


(1) 


Now  consider  a generalized  linear  function  of  our  predictors: 

Y - VjXj  + V2X2  + . . . + VpXp^  (2) 

Y represents  a general  notation  for  our  discriminant  function;  whereas,  an  individ- 
ual value  might  be  Y-jq)^  (sample  observation  k,  in  group  g,  of  discriminant  function 
j).  We  then  note  that  for  an  individual  discriminant  Yj^,  that  the  sum  of  squares 
between  (among)  groups  is: 


SSB(Y, ) -2^  n 


g ^^ig. 


-\r. 


(3) 


The  complete  notation  for  our  first  discriminant  function  for  the  kth  observation 
of  group  g becomes: 

(*> 

The  task  is  to  determine  the  coefficients  so  that  the  ratio  of  the 

between  groups  sum  of  squares  to  the  within  groups  sum  of  squares  is  a maximum. 

We  call  this  ratio  the  discriminant  criterion  and  denote  it  as  A;  thus,  the 
maximized  ratio  is 


X J - SSB(Y^) 
SS»(Yj) 


(5) 


This  process  of  maximization  can  be  carried  out  by  expressing  the  sums  of 
squares  as  quadratic  equations  of  the  predictor  variables  and  then  applying  dif- 
ferential calculus  to  find  maxima.  However,  the  computation  is  greatly  simplified 
by  using  algebra  (Bryan,  1951,  pp90-95) . Accordingly,  we  return  to  our  predictor 
variables  Xp,  and  for  each  of  the  G groups  compute  the  raw  sums  and  sum  of  squares 

for  each  Xp^(p=l, ,P).  The  pooled  within  group  sum  of  squared  deviations  about 

the  group  mean,  SSW(Xp),  can  be  computed,  since; 

Q n, 

g«l  k-1 


SSW(Xp) 


P8* 


a 

E 


Z«x2 

k-1  ^ 


, isL 


(6) 


P-1..  .P). 


The  initial  calculation  of  the  raw  sum  and  sum  of  squares  will  yield  the  total  sum 
of  squared  deviations  about  the  grand  mean,  SST(Xp),  because: 


SSt(Xy) 


t-l  k-1 


C-1  k-1 


(7) 


(p-lt . . . .P) 


Subtracting  the  pooled  within  group  sum  of  squares,  SSW(X^),  from  the  total  sum  of 
squares,  SST(X  ) yields  the  sum  of  squared  deviations  between  group  means  and  the 
grand  mean.  That  is: 

SSB(X^)  • SST{Xp)  - SSW(Xp)  , 


(p"lt...iP) 


•Inc* 


SSB(Xp)  . 2.  J:"  l»pg.  - ^p 
g«l  k*l 


g«l 


G 

E 

g-i 


isi 


2 


£ If 

k«i 


(8) 


(p"l»...*p) 


Additionally,  the  sums  of  cross  products  between  each  of  the  raw  predictors 
must  be  used  to  calculate  the  sum  of  products  between  and  within  groups  for 
variable  X and  X , let's  say,  with  p,q=l,...,P  and  p q.  The  notation  is 
SPW(XpXq)  and  SPB fXpXq) . For  the  within  groups  determination  we  have: 


SPW(XpXj^)  ^*pgk-  ^pg.^^*qgk'^qg. 

g-1  i5*l 


G 


E *P«k*qgk 
k«4 


‘S'  W’®' 


- k»l 


k»l 


n. 


(9) 


(p,q»l  I . . . »P|  p (l) 


Using  the  raw  sums  and  sum  of  crossproducts  we  can  obtain  the  total  sum  of 
products,  SPT(XpXq) , since: 

SPT(XpXq)-'^  S*  <*pgk’^P--^^*«WX-*q..> 

«-l  k»l  (10) 

G n_  G n 

<D  S'*,.!' 


E S' WmI- =1-“-— 5-*=*-“- 


gsi  k-i 


C-g 

g-1 


(ptq"ii..tP»  p q) 


As  before,  the  sum  of  products  between  groups  SPB(XpXq)  is  obtained  by  subtracting 
the  sum  of  products  within  groups,  SPWCX^X^j),  from  the  total  sum  of  products, 

SPT  (XpXq)j  that  is:  ^ 


SPB(X,I,)  - - SP.(X,X,)  ^ ^ 


UD 
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it- 


This  results  sines i 


SPB(X  X ) . f ^p..) 

g-1  k-l 

•i:  w- 

g“i 


(12) 


G 


E?x 

k»l 


Pgk 


sE 


k-1 


G n. 


G n. 


(E)  E*  X jj)(C  E*  X ) 

- g-l  k-l  P8^  g«l  k»l  ^ 

IJ-g 

g“l 

(Ptq“l Pi  P ^ q) 


A matrix  W,  representing  the  pooled  within  group  matrix,  is  then  constructed 
from  these  derived  quantities: 


SSW(Xj)  SPl#(XjX2) 

spwCXjXj)  sswCXj) 


SPW(XjXp_j)  SPWCXjXp,^) 
SPW(XjXp)  SPW(X2Xp) 


s s • 


SPW(XjXp_j)  SPW(XjXp) 
SPW(X2Xp.j)  SPW(X2Xp) 


SPW(Xp_jXp) 


SSI«(Xp_j) 

SPW(Xp_jX  ) SSW(Xp) 


(13) 


Hi 


The  data  previously  accumulated  from  (12)  about  the  sums  of  products  between  the 
various  groups  is  then  used  to  construct  a pooled  between  groups  matrix  B. 


1 


SSB(Xj) 

SPB(XjX2) 


SPB(XjX2) 

SSB(X2) 


SPB(XjXp_j)  SPB(XjXp) 


SPB(X2Xp_i) 


SPB(X2Xp) 


SPB(XjXp.^)  SPP(X2Xp_j), 

SPB(XjXp)  SPB(X2Xp) 


SSB(Xp_j) 


SPB(Xp_jXp) 


s s s 


SPB(Xp_jXp)  SSB(Xp) 


(1^) 


The  computational  procedure  for  deriving  the  discriminant  functions  will  make 
use  of  the  fact  that  all  the  discriminants  are  calculated  from  these  matrices  W and 
B.  The  actual  process  involves  first  pre-multiplying  B by  the  inverse  of  W,  an3 
then  determining  the  eigenvalues  and  eigenvectors  of  tRe  matrix  that  results.  The 
latter  operation  produces  the  solution  to  the  determinant  equation 


->i|  - 0 

(X  ■ the  unit  matrix) 


(15) 
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(16) 


The  eigenvectors  are  solutions  to  the  equation: 


0 


( j*l» . . . tmin(G-l ,P) ) 


The  elements  of  the  eigenvectors  (V  represent  the  weights  to  be  applied  to  the 
original  X predictors  in  our  linear  function: 


^ (17) 

{ iiiin(G-l,P)) 


The  value (s)  of  x ^ are  the  roots  of  the  characteristic  equation  resulting  from 
the  expansion  of  the  determinant  in  (15)  ; they  represent  the  ratios  of  the  corres- 
ponding Yj's  sum  of  squares  between  and  within  groups,  namely: 


(18) 


( 

g-l  k-l 


■jgk 


( j"l,....min(G-l,P)) 


(Miller,  1961,  pp6-13) 

The  characteristic  equation  derived  from  W will  no  doubt  have  several  roots. 
From  each  of  these  roots  we  can  calculate  an  eigenvector  Vj , the  elements  of  which 
represent  a new  set  of  combining  weights...  Vjj^,V.2...  V^p.  If  these  new  elements 
are  used  to  form  a second  linear  combination 


^2  " ^21^  ^22*2  ^2P*P» 

the  Y2  is  a new  discriminant  function  that  is  uncorrelated  with  Y^,  but  having  its 

ratio  X2  of  sum  squares  between  groups  to  sum  of  squares  within  groups  a maximum 
after  As  the  process  continues,  each  successive  Y is  linearly  uncorrelated 

with  any  of  the  preceding  linear  combinations,  and  has  its  ratio  of  sum  of  squares 

between  and  within  groups  a maximum.  The  values  created  (Y, , Y.,...,  y . ) 

12  min  (G-1,P) ' 

are  the  first,  second,  ...etc.  discriminant  functions  for  optimally  differentiating 
among  the  G groups.  Thus,  by  having  several  discriminators,  we  are  shown  the 
dimensions  of  the  differences  between  groups  and  the  direction  along  which  maximum 
group  differences  occur. 

In  principal  component  analysis  the  dimension  corresponding  to  the  first  com- 
ponent has  maximum  variance;  whereas,  the  second  component's  dimension  has  maximum 
variance  among  those  uncorrelated  with  the  first,  and  so  on. 

In  MDA  X,  the  ratio  of  between  to  within  groups  sums  of  squares,  merely  ta)ces 
the  place  of  variance  as  the  factor  determining  the  various  successive  dimensions. 
It  should  be  noted,  however,  that  in  the  discriminant  space  the  axes  are  not  neces- 
sarily mutually  orthogonal  even  though  they  are  uncorrelated.  Although  the  dis- 
criminant function  performs  a linear  transformation  on  the  original  X predictor 
axes,  the  rotation  that  occurs  may  indeed  be  an  oblique  rotation  (Tatsuo)ca , 1971 
pp  162-3) . 

We  can  reiterate  the  entire  procedure  in  a stepwise  fashion: 
a.  Assemble  the  within  groups  sum  of  squares  from  (6). 


N 


I 


b.  Compute  the  total  sum  of  squares  about  the  qrand  mean,  SST(Xp),  using  (7). 

c.  Next,  assemble  the  data  from  steps  a and  b and  calculate  the  sum  of  squared 
deviations  between  group  means  and  the  grand  mean  using: 


SSB{Xp)  - SST(Xp)  - SSW(Xp). 

d.  Compute  the  sum  of  products  between  groups  using  (10)  and  (12);  and  then 
construct  the  matrix  B as  in  (14) . 

e.  Derive  the  matrix  W whose  eigenvalues  and  eigenvectors  we  wish  to  find, 
using  (13)  and  (14). 

f.  Develop  the  characteristic  equation  for  the  matrix  found  in  e using  the 
relation: 

|w"'b  -.All  - 0 

in  expanded  form. 


g.  Solve  for  the  roots  of  the  equation  in  step  f. 


h.  These  roots,  \ , X2,  ...etc.,  are  then  used  to  determine  the  eigenvectors 
Vj , V2,  ...etc.  whose  elements  are  the  weights  to  be  applied  to  the  original  pre- 
dictors in  our  linear  relation  for  defining  the  various  Y-iS.  The  procedure  for 
finding  these  characteristic  vectors,  as  outlined  by  Tatsoo)ca  follows: 

(1)  For  each  given  eigenvalue  A . , compute^^or  form  the  matrix  w”^B  - 

by  subtracting  A^  from  each  diagonal  element  of  ^ 

(2)  Compute  the  adjoint  of  the  matrix  W~^B-A (adj  W ^B-A  . I) . The 
adjoint  or  ad jugate  of  a matrix  is  found  by  gathering^ the  cofactors'^of  all  the 
elements  of  the  matrix  and  then  using  these  cofactors  to  form  a new  matrix  adj (A) . 
However,  the  new  matrix  is  constructed  so  that  the  cofactor  of  the  elements  of  the 
first  row  of  the  original  matrix  A are  now  used  to  form  the  elements  of  the  first 
column  of  adj (A) --those  of  the  second  row  of  the  cofactor  matrix  become  the  second 
column  of  the  adj  matrix,  etc. 


(3)  Next  divide  the  elements  of  any  column  of  adj  (W~^B  - ^jl.)  by  the 
square  root  of  the  sum  of  the  squares  of  these  elements.  The  resulting  numbers 
are  the  elements  of  the  eigenvector  Vj. 


i.  Each  value  of  Aj  produces  an  eigenvector  Vj  whose  elements  are  the  weights 
or  coefficients  in  the  linear  equation  for  each  of  the  discriminant  functions  Yj. 
That  is;  ^ 


— 

^11 

^21 

• 

^12 

, and  Vg  ■ 

^^22 

1 

< 

v23 

are  eigenvectors  derived  from  discriminant  criteria  A 1 and  x 2 (eigenvalues) 
to  be  used  to  set  up  the  linear  relations 


and 


* ^12*2  ^13*3 

^21*1  ^22*2  ^^23*3 


respectively.  (Tatsuolta,  1971,  ppl66-70)  . 


4.  Selecting  the  Predictors. 


In  applications  of  multiple  regression  analysis  it  is  generally  agreed  among 
researchers  that  most  of  whatever  predictability  is  in  a particular  set  of  pre- 
dictors is  contained  in  a reasonably  small  subset  of  the  total  group.  Indeed,  the 
sheer  unreasonableness  of  the  computations  Involved  in  evaluating  huge  matrices 
constructed  of  all  the  predictors  led  to  the  development  of  a technique  called 
screening  regression.  This  method  was  first  developed  by  Joseph  G.  Bryan  (1951) . Its 
purpose  was  to  define  a small  group  of  the  predictors  that  contai'iod  most  of  the 
predicting  Information  required. 
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The  technique  determines  which  variable  has  the  highest  correlation  with  the 
predictand  and  chooses  that  variable  as  the  first  predictor.  Of  the  remaining 
variables,  the  predictor  having  the  highest  partial  correlation,  after  the  effects 
of  the  first  predictor  are  removed,  is  the  next  one  selected.  The  process  continues- 
-each  successive  selection  being  based  on  the  highest  partial  correlation  after  the 
effects  of  previously  selected  variables  have  been  removed.  The  screening  ceases 
at  a pre-determined  point  when  further  significant  improvement  is  not  obtained.  If 
the  original  total  set  of  predictor  variables  was  large,  this  method  may  provide  a 
considerable  savings  in  labor. 

Such  a stepwise  screening  process  has  also  been  developed  for  applications  to 
MDA.  Selection  is  made  from  a large  set  of  variables  to  first  find  that  variable 
which  provides  the  greatest  degree  of  discrimination  between  groups  as  measured 
by  the  generalized  Mahalanobis  distance,  D^.  This  is  discussed  later.  This  varia- 
ble is  our  first  predictor  X'^' . Among  the  remaining  P-1  variables  a second  is 
selected  which  together  with  the  first  gives  the  highest  value  of  D^.  This  second 
variable  is  denoted  Added  predictors  continue  to  be  selected  in  like  manner — 

always  yielding  a maximum  value  of  when  combined  with  the  previous  predictors. 
(Miller,  1961,  pp34-36) 

The  extension  of  the  Mahalanobis  to  situations  involving  more  than  2 groups 
was  first  developed  by  Rao  (1952)  and  Bryan  (1950)  , For  P predictors  the  test 
statistic  D^p  is  defined  as 

D^p  - (n  - a • P)  • trace  (19) 

W and  B are  the  matrices  defined  earlier  and  trace  w”^B  refers  to  the  sum  of  the 
diagonal  elements  of  the  matrix  W“^B.  n is  one  less  tHan  the  number  of  in- 
dependent observations  in  the  sample.  The  distribution  of  D^p  is  estimated,  for 
n large,  as 

D%»>*X^(P(G-1)).  (20) 


A modification  of  this  relation  will  be  used  to  assist  in  determining  if  success- 
ively selected  predictors  are  statistically  significant. 

The  actual  screening  procedure  is  applied  as  follows.  For  every  one  of  the 
P available  predictors  Xp(p=l,..,P)  determine  the  quantities  SSW(Xr,)  and  SSBfX-,). 
Using  these,  find  ^ F p 


tr»o«  St”'B(*p) 


, SSB(«^) 
SSW(Xp) 


(p*l, . . . .P) 


(21) 


To  select  the  first  predictor  x^^^where  the  total  may  be  r,(r  <_P)  , we  use  the 
criterion 


trac*  a"^B(*^^^)  2 traoa  W"^i(Xp).  (22) 

( P*1 f ■ • • » P) 

if  X^^^  can  be  determined  to  be  statistically  significant.  A description  of  this 
test  is  given  later. 

For  the  P-1  remaining  variables  calculate  SPW(x(^*Xp)  , and  SPB(X^^^Xp).  From 
these  and  the  values  of  SSW(Xp)  and  SSB(Xp)  already  compuced  we  then  derive: 


traca  ^Xp)"  trao* 


SSWCX^^^) 

SPW(X^^^Xp) 


SPW(X^^^Xp) 

SSW(Xp) 


-1 


SSB(X^^^)  SPB(X^^^Xp) 


SPB(X^^^Xp)  SSB(Xp) 


(23) 


(p-1 PiXp  x^^h 
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The  selection  of  the  second  predictor  is. conditional  on  the  fact  that  has 

already  been  selected.  The  criterion  for  x is 

tr»c«  2 trace  /B(X^^^Xp) 

(p-1 PiXp  ^ X<^)) 

(2) 

again  provided  that  X'  ' is  statistically  significant.  So  then,  a general  form  can 
be  used  to  choose  those  to  be  selected  out  of  the  remaining  predictors;  this  is: 


trace 


1 trace  W-^B(X<^ ^X^^) . ^Xp) 

(p-l,..,P»  Xj/  x^M...x‘^"^^) 


The  entire  procedure  continues  until  r predictors  have  been  chosen.  The  total 
number  of  selected  predictors  r is  completely  determined  when  the  variable 
fails  to  show  statistical  significance.  (Miller,  1961,  pp  43-47) 

A particular  predictor,  say  X*^^  is  deemed  significant  if 

°1  - °0  > ^25) 

That  is,  the  criterion  for  deciding  when  to  discontinue  selection  is  based  on  Rao's 
test  on  , with  a modification  introduced  to  assist  in  determining  the  significance 
of  a newly  selected  variable. 

After  choosing  a predictor  from  a set  of  variables,  a chi  square  test  is  per- 
formed and  the  critical  value  of  “ * is  set  somewhat  arbitrarily  at  .05.  This 
allows  a 1/20  chance  of  the  predictor  chosen  being  significant  when  in  fact  it  is 
not.  The  test  size  is  then  designated  as  “ = .05.  In  our  selection  procedure, 
however,  a predictor  variable  is  chosen  out  of  the  P total  because  it  maximizes 
some  function  of  the  test  statistic.  It  is  necessary,  therefore,  to  consider  at 
what  level  of  probability  the  critical  value  of  “ * should  be  set  while  still 
allowing  for  only  a 1/20  chance  of  error. 

Let  “ * represent  the  probability  that  one  or  more  of  these  predictors  are 
adjudged  sianiflcant  when,  in  reality,  not  one  of  the  P total  is  significant.  So, 
1-  “ * m (1-  “ )P,  provided  that  P tests  are  independent.  If  “ is  small,  we  assume 
that  (1-“)*^  approx.  * (1-P  “ ) . Then,  (!-“*)  approx  = (1-P“  ),  and  then 

apprex.-^  (26) 

2*  P 

Let  x«  ♦ ^ the  critical  chi  sauare  value  when  selection  is  performed,  and 


the  desired  size  of 

apprax.-X^^ 


* €)C  ~ ^ 

Then,  if  « denotes  the  desired  size  of  the  selection  test,  then 


P,  again,  is  Jhe  total  number  of  predictors  possible.  Therefore,  the  critical 
value  of  X for  testing  the  significance  of  the  selected  predictor  X'°'  is: 


X?  «!-  ^ 


(S"l • a » • |P) 


2 

Further,  based  on  maximizing  D as  our  selection  criterion  and  a level,  of  signifi- 
cance as  expressed  in  (28),  the  test  of  the  selected  predictor  is: 


(S-1, . . . ,P) 

(Miller,  1961,  pp. 49-51) 


5.  Estimating  Probabilities. 


The  application  of  MDA  to  meteorological  parameters  seems  particularly  well 
suited  to  predicting  the  occurrence  of  events  that  can  be  classified  into  unordered 
groups.  Miller's  original  work  demonstrated  that  problems  can  arise  in  achieving 
adequate  discrimination,  if  multivariate  normality  is  not  present.  The  original 
idea  had  been  to  select  predictors  and  obtain  a poeterior  probabilities  using 
Bayes'  theorem — assuming  this  normality  and  equal  dispersion.  It  soon  became 
apparent  that  these  assumptions  were  not  always  tenable. 

The  first  effort  that  was  successful  in  using  more  than  just  the  first  discrimi- 
nant function  was  made  when  a rectangular  area  was  constructed  around  a new  obser- 
vation (V1Y2)  in  the  two  dimensional  discriminant  space.  Group  relative  frequencies 
inside  this  area,  constructed  of  observations  of  the  development  S2unple,  were  taken 
as  estimates  of  the  conditional  probability  distribution.  A method  was  then  em- 
ployed that  used  the  idea  of  the  Euclidean  distance  as  a way  of  defining  a more 
desirable  spherical  area  (neighborhood)  about  (Yiy2) • This  method  turned  out  to 
be  highly  successful  at  providing  valid  probabilities  in  the  multi-dimensional 
space . 

The  procedure  was  first  developed  by  Fix  and  Hodges  (1951)  and  requires  com- 
puting the  Euclidean  distance  D between  the  new  observation  (y)  and  each  of  the 
observations  of  the  dependent  sample.  So,  for  all  the  N observations,  determine 
the  weighted  distances 


D(r.  4k) 

i 


r*. 


k»l,..,ng 

g-l*. ..G 


1/2 

(30) 


where 


and  into  which  we  have  substituted  Yj . . , where 


Jek  ^ 


(31) 


^ A 

J k-l , . . 

«-l,..i0* 

j"!  • . . I 't 


and  0y  ■ 


(32) 


(J-l,...t) 
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The  weights  X are  the  eigenvalues  previously  found.  Here,  they  serve 

as  a means  of  producing  an  arbitrary  metric  in  the  discriminant  space  which  accounts 
for  the  relative  importance  of  the  discriminant  functions.  The  metric  in  the  dis- 
criminant space  is  chosen  so  that  each  dimension  j has  zero  mean  and  variance  equal 
to  X ./  ^ 1 (j=  2,..,t),  and  the  first  discriminant  function  is  transformed  to  have 

unit  ’Jaria  nee . 

The  N distances  D(^',  Y'g)^),  {k=l,..,ng;  g=l,..,G),  are  then  ordered.  The 
point  whose  distances  to  is  the  least  is  ranked  as  first,  the  next  closest  is 
ranked  second,  etc.,  down  to  the  point  having  the  greatest  distance.  Next,  a spher- 
ical area  is  drawn  around  the  point  y.  choosing  the  h(h  <.N)  closest  points.  Asso- 
ciated with  each  of  these  h points  is  a particular  group  which  occurred  subsequent 
to  the  observation  of  that  point.  The  ratio  of  the  observed  frequencies  in  each  of 
the  groups  to  the  total  number  h determines  estimates  of  the  desired  probabilities 

Pg,  (g=l,  ...,  G) 


6.  Applications  to  Precipitation  Forecasting. 

What  follows  next  is  a brief  summary  of  one  aspect  of  the  research  originally 
performed  by  Miller  in  his  1961  work  with  the  Hartford,  Connecticut  data.  The 
example  outlined  was  the  more  successful  of  the  two  projects  undertaken,  in  that 
the  results  show  a clearer  discrimination  between  both  amounts  and  types  of  precipi- 
tation. Some  of  the  graphs  and  tables  presented  are  taken  directly  from  the  origi- 
nal work  with  the  author's  permission. 

Discriminant  analysis  seems  particularly  applicable  to  the  problem  of  fore- 
casting the  type  and  amount  of  precipitation  that  will  occur  at  a particular  loca- 
tion at  a specific  time.  The  first  step  is  to  define  the  operationally  significant 
conditions,  i.e.,  decide  what  phenomena  we  wish  to  predict,  and  how  much  of  each 
will  be  considered  significant.  Then  divide  or  categorize  the  various  phenomenon 
(or  degree  of  phenomenon  occurring)  into  distinct  groups.  For  economy  of  effort, 
if  the  number  of  predictor  variables  is  large,  the  screening  technique  shown  in 
section  4 may  be  applied. 

For  this  particular  example  a set  of  five  conditions,  describing  various  types 
and  degrees  of  precipitation,  were  chosen  as  being  operationally  of  interest. 

(see  Table  1) . The  forecasts  are  to  be  valid  six  hours  after  the  daily  observa- 
tions (predictors)  were  noted.  The  data  base  consisted  of  an  independent  sample 
of  221  observations  over  a 1 year  period,  and  a dependent  sample  of  1096  observa- 
tions over  a prior  3 year  period.  A total  of  seven  meteorological  parameters 
noted  at  each  of  twenty- five  stations  in  the  United  States  were  chosen  as  repre- 
senting the  total  number  of  predictor  variables  possible  in  the  sample.  Prom  these 
one-hundred  and  seventy- five  predictors  a group  of  sixteen  were  shown  to  possess 
significant  information  as  determined  from  measures  of  the  generalized  distance  D^. 
Table  2 depicts  the  meteorological  parameters  in  the  total  sample  P;  whereas.  Table 
3 shows  the  final  sixteen  selected  along  with  the  calculated  values  for  trace 
-122  2*  . 

5'  ” *^S-1'  ^ .05'  Miller  points  out  that  the  physical  significance 

of  just  why  one  particular  variable  has  greater  utility  in  predicting  the  weather 
over  any  other  variable  would  be  very  difficult  to  determine.  Table  4 shows  the 
characteristic  roots  and  vectors  for  W“lB(X<r) . . , and  it  is  interesting  to 
note  that  in  this  example  all  four  roots  show  statistical  significance. 

Each  individual  observation  from  the  dependent  and  independent  samples  was 
plotted  in  two  dimensions  with  one  axis  Yi  and  the  other  (l2/Xi)*5  yJ.  This  is 
according  to  the  non-parametric  procedure  outlined  in  section  5.  The  construction 
of  the  dimensions  is  done  so  as  to  enable  a circle  to  correspond  to  the  area 
(neighborhood)  described  for  the  above  mentioned  non-parametric  procedure  for 
estimating  conditional  group  probabilities.  In  Figure  3 we  see  a composite  of  all 
groups  displayed,  showing  the  fifty  percent  contour  ellipses.  Bivariate  normality 
is  assumed  without,  necessarily,  having  equal  dispersions.  Figure  4 shows  the  dis- 
tributions of  the  dependent  and  independent  sample  points  for  group  one  only.  The 
fifty  percent  contour  ellipse  for  group  one  is  projected  onto  this  plot  to  show  the 
degree  of  bivariate  normality  obtained.  These  latter  two  graphical  depictions 
represent  data  for  group  one  of  the  precipitation  sample  only.  The  remaining 
groups  have  similar,  though  somewhat  less  dense,  distributions;  and  they  are  not 
shown  for  the  sake  of  brevity.  From  these  two  and  the  remaining  data  plots  Miller 
was  able  to  conclude  the  following: 


I 


T«ble  1.  DBSCRIPTION  OP  THE  PRECIPITATION  CROUPS 
FOR  WE  HARTFORD,  CONHETICUT  EXAMPLE. 


CROUP 

NUMBER 

CONDITIONS 

1 

No  precipitation  of  any  kind  over  the  forecast 
period. 

2 

Rain  or  freezing  rain  reported  at  some  time  over 
the  forecast  period  in  the  amotint  of  at  least  a 
trace  but  not  more  than  .05  inches.  No  snow  or 
sleet  reported  et  any  time  over  this  period. 

3 

Snow  or  sleet  reported  at  some  time  during  the 
forecast  period  in  the  amount  of  at  least  a trace 
but  not  more  than  ,05  inches  of  melted  mter 
equivalent. 

4 

Rain  or  freezing  rain  reported  at  some  time  dur- 
ing the  forecast  period  in  the  amount  of  greater 
than  .05  inches.  No  snow  or  sleet  reported  at 
any  time  over  this  interval, 

1 

5 

Snow  or  sleet  reported  at  some  time  over  the 
forecast  period  in  the  amount  of  greater  than 
,05  inches  of  melted  water  equivalent. 

Table  2 Available  Meteorological  Predictors. 


ELEMENT 


NOTATION 


Sea  Level  Pressure 

Past  3 hour  change  in  sea  level  pressure 
Dry  bulb  temperature 
Temperature-dew  point  depression 
East-West  wind  component 
North-South  wind  component 
Total  cloud  cover 


(P) 

zy 

T 

T-T 

u 


(N) 


d 


Table  2a.  Specifications  for  the  Precipitatlsn  Study. 


SPECIFICATION 

NOTATION 

NUMERICAL 

VALUE 

Number  of  groups 

G 

5 

Observations  in  Group  1 

"1 

817 

Observations  in  Group  2 

nj 

135 

Observations  in  Group  3 

"3 

29 

1 

Observations  in  Group  4 

"4 

92 

Observations  in  Group  5 

"5 

23 

Total  dependent  sample  size 

N 

1096 

One  less  than  the  number  of  independent 
observations  in  dependent  sample 

n 

1095 

Total  independent  sample  size 

M 

22r 

Number  of  available  predictors 

P 

175 

Forecast  intervaKhrs. ) 

H 

0-6 

r 


Table  3 THE  SELECTED  PREDICTORS 


SELECTED 

VARIABLE 

x(s) 

STATION 

1 

ELEMENT 

1 

IQ 

■■  1 

X(l) 

Boston*  Massachusetts 

N 

0.26? 

291.30 

X(2) 

Portland,  Maine 

0.422 

168.33 

x(3) 

St.  Ste.  Marie*  Mich. 

T 

0.529 

115.67 

x(4) 

Hartford*  Connecticut 

T-T^ 

0.630 

108.68 

x(5) 

Buffalo*  New  York 

T 

0.737 

114.60 

X(6) 

Boston*  Massachusetts 

1 

u 

0.806 

73.55 

X(7) 

Hatteras*  N.C. 

V 

0.857 

5^.11 

X(8) 

Norfolk*  Virginia 

Ap 

0.915 

61.25 

,(») 

New  York,  New  York 

T 

0.962 

49.40 

X<10) 

Portland*  Maine 

V 

1.008 

48.12 

,(11) 

Nantucket*  Mass, 

V 

1.053 

46.85 

,(12) 

Norfolk*  Virginia 

T 

1.090 

38.33 

,(13) 

Oklahoma  City*  Okla. 

V 

1.120 

30.93 

x(i^) 

Caribou*  Maine 

T 

1.14? 

27.70 

,(13) 

Boston*  Massachusetts 

T 

1.184 

37.78 

X(16) 

Albany*  New  York 

V 

1.211 

27.43 

^.05 


21.68 


4-15 


a.  For  th«  population  diatributlona  within  each  of  the  five  qroups  the  asaump- 

tlons  of  bivariate  normality  and  equal  dlaperslona  appear  to  be  appropriate.  Since 
the  eelectlon  criterion  has  optimum  properties  under  such  conditions,  we  can 
also  assuM  that  the  predictors  selected  are  those  that  C'  the  smst  Information 

for  predictive  purposes. 

b.  There  appears  to  be  good  aqreaaten*  (dots)  and 

independent  (crosses)  sang>lc  observations 


The  data  from  the  one,  two,  three,  an>^ 
then  used  to  develop  probability  predict 
dependent  sample  only  and  are  given  )u 
derived. 


rainant  space  was 
?or  the 
it  were 


These  tables  are  arranged  In  group 
right  and  the  various  meteorological  grt' 
elude  the  number  of  forecasts  sMde  within 
blllty  (F) 1 the  number  of  actual  occurrences 
ted  probability  was  In  that  range  (U) i the  sum 
casts  ( 1 P) > the  sum  of  the  product  of  P and  1 1 
and  lastly  the  coaqii  ted  values  for 


ft  to 
in- 

proba- 


From  these  two  and  the  resMlnlng  tables  (not  sho.  i 

2 

a.  The  X tests  for  validity  on  the  group  probabi, 
no  general  tendency  for  the  probabilities  to  becoaw  less  v< 
use  of  additional  discriminant  functions — this  despite  genera 
X ^ values. 


b.  The  simplified  picture  shown  In  Figure  3 disguises  the  fact  - 
leas  tendency  for  the  probabilities  to  be  sharpened  for  conditions  of  snow,  wh> 
the  second  discriminant  function  Y2  Is  used. 


fi 


Figure  3.  Mean  and  fifty 

the  modified  discriminant 
example. 


per  cent 
apace  Y 


contour  ellipse,  assuming  bivariate  normality.  In 
for  each  group  of  the  precipitation 


‘•Vf 
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In  addition.  It  was  also  found  that  a discrepancy  exists  between  the  number  of  dis- 
criminant functions  considered  significant  and  the  number  actually  producing  the 
best  results  in  the  independent  data.  In  the  independent  data  inclusion  of  a third 
discriminant  function  produces  marked  deterioriation  in  the  accuracies  of  the  pre- 
dicted probabilities;  even  though,  all  four  of  the  discriminant  functions  tested 
were  found  to  be  significant.  (Miller,  1961,  pp  95-130). 

We  can  also  draw  a few  more  general  conclusions  from  the  data  presented.  Note 
from  Figure  3 that  the  vertical  axis  seems  to  "discriminate"  groups  3 and  5 (snow) 
from  groups  2 and  4 (rain).  Whereas,  the  horizontal  axis  seems  to  orient  the  groups 
by  degree  of  precipitation — with  the  lesser  amounts  to  the  right  and  the  greater 
amounts  to  the  left.  The  success  of  the  method  can  be  measured  to  some  extent  by 
looking  at  the  most  general  classifications  possible — the  prediction  of  precipita- 
tion or  no  precipitation.  If  we  use  the  case  where  the  probability  of  no  precipita- 
tion was  0.50  or  greater  as  our  standard  for  categorically  predicting  no  precipita- 
tion, the  forecasts  of  no  precipitation  numbered  lb4  out  of  221,  with  144  correct. 
Precipitation  was  forecast  57  out  of  221  times  with  46  correct.  In  the  independent 
data  sample  there  was  an  overall  percentage  of  86%  correct  forecasts — in  good 
agreement  with  the  dependent  sample  of  87%. 

The  effectiveness  of  the  multiple  discriminant  analysis  technique  in  producing 
valid  forecasts  is  demonstrated.  The  usefulness  of  MDA  as  a forecasting  tool  is, 
however,  limited  by  resources  at  hand.  The  computations  are  all  but  impossible  and 
highly  impractical  without  the  aid  of  a high  speed  computer.  Further,  other  statis- 
ti(:al  methods,  requiring  much  less  labor,  have  recently  come  to  light;  and  these 
procedures  may  in  the  long  run  totally  replace  MDA  as  a predictive  method. 

(Tanur,  1972,  pp  383-384) 


Figure  k.  Distribution  of  dopondont  ssapia  points(dots) 
and  indopondont  ssaplo  points (cross os)  in  tho  aodiflod 
disorlBinsnt  spsoo  , / group  ono. 

'v  i!r 
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CHAPTER  5 


REGRESSION  ESTIMATION  OF  EVENT  PROBABILITIES 
by  CAPTAIN  WENDELL  POOL,  JR. 


This  chapter  discusses  an  application  of  Regression  Estimation  of  Event 
ProbaUailities  (REEP)  by  Bryan  and  Singer  (1965).  Their  project  was  a statistical 
approach  to  predicting  the  probability  of  a first-term  Navy  recruit's  reenlistment. 

REEP  was  demonstrated  as  a useful  prediction  technique  by  Miller,  Johnson,  and 
Sorenson-  see  Miller  (1964).  Its  development  resulted  from  efforts  to  make  multiple 
discriminant  analysis  more  efficient. 

Bryan  and  Singer  studied  the  following  problem: 

Using  available  records  on  eligible  first  term  electronics  men 
(e.g.,  age,  education,  test  scores,  length  of 

military  duty,  recruiting  area)  what  sort  of  index  or  statistical 
digest  of  this  information  can  be  devised  so  as  best  to  distin- 
guish eventual  reenlistees  from  non-reenlistees? 

Their  approach  was  to  identify  the  independent  components  of  significant  data 
bearing  on  the  question,  then  to  develop  a prediction  formula  into  which  an 
individual's  particular  data  could  be  inserted.  These  data,  termed  variables,  were 
selected  from  the  expansive  quantity  of  information  compiled  on  each  enlistee  at 
enlistment.  The  first  phase  of  the  Navy  study  involved  a partial  screening  of  the 
possible  significant  variables,  and  eliminated  many  as  not  bearing  on  the  question  — 
e.g.,  height,  weight,  color  of  hair.  In  the  second  phase,  that  in  which  Bryan  and 
Singer  were  involved,  the  variables  were  more  carefully  screened  using  a multiple 
correlation  approach.  Reenlistment  rates  were  calculated  for  one  varicible,  while 
holding  one  or  more  other  variables  constant. 

A variable  that  is  the  object  of  prediction  or  estimation  will  be  called  the 
predictand  (in  the  present  application,  reenlistrent) , and  the  varieibles  used  to 
arrive  at  the  prediction  or  estimation  will  be  called  predictors  (for  excimple, 
age,  education,  test  scores) . The  type  of  prediction  problem  under  consideration 
is  that  in  which  the  predictand  can  assume  any  one  of  several  distinct  values, 
levels,  or  states  (in  the  application  to  reenlistment  prediction  there  are  two 
states,  to  reenlist  or  not) ; and  the  object  is  to  make  use  of  the  information 
avail^lble  in  the  predictors  to  estimate  the  respective  probabilities  associated 
with  each  possible  predictand  state;  that  is,  to  estimate  the  chances  that  any 
specified  state  will  be  the  one  that  the  predictand  actually  assumes  in  a given 
instance. 

Let  the  number  of  distinct  states  of  the  predictand  be  denoted  by  G.  Unless 
otherwise  noted,  these  G states  (the  case  in  point  has  two  states  of  G,  to  reenlist 
or  not)  are  exhaustive  and  mutually  exclusive.  If  the  predictors  uniquely  determine 
the  predictand,  the  probability  is  unity  for  some  one  predictand  state,  as  fixed 
by  the  predictors,  and  zero  for  all  others.  In  a real  situation,  however,  the 
predictors  merely  influence  the  probability  by  tending  to  favor  the  occurrences  of 
some  states  more  than  others,  depending  on  the  given  values  of  the  predictors,  and 
how  the  probability  of  occurrence  is  distributed  over  all  G states.  The  statistical 
problem  is  to  describe  this  distribution  in  terms  of  the  predictors. 

REEP  uses  multiple  regression  analysis.  A dummy  variable  D (g=l,2, . . . ,G)  is 
associated  with  each  state,  g,  of  the  predictand:  D = 1 if  sta?e  g occurs;  D = 0 
if  state  g does  not  occur.  Each  dummy  variable  D , Sn  turn,  is  treated  as  a ^ 
predictand,  to  be  estimated  by  a separate  regression  function  (one  for  each  dummy 
predictand) . The  device  of  using  a common  set  of  predictors  for  all  D , as  REEP 
does,  insures  that  the  sum  of  the  estimated  probcJoilities  will  be  idenfically  equal 
to  one  in  every  instance. 

In  the  strict  definition  of  the  term,  a regression  function  defines  the 
conditional  meem  value  of  a predictand  for  any  specified  set  of  values  of  the 
predictors.  The  true  conditional  mean  value  of  a dummy  variable,  D , is  identically 
equal  to  the  relative  frequency  — hence,  the  conditional  probabili?y  — of  the 


occurrence  of  state,  g,  under  the  conditions  defined  by  the  predictors.  If  the 
exact  mathematical  specification  of  the  regression  function  could  be  given,  the  true 
conditional  probabilities  could  be  determined  from  it. 

In  actuality,  the  mathematical  specification  of  the  regression  function  is  not 
available,  As  a serviceable  approximation  to  that  function,  KEEP  uses  a linear 
expansion  in  terms  of  dummy  variables,  constructed  from  the  predictors.  These  dummy 
variables  can  represent  simple  classes  pertaining  to  individual  variables,  or,  if 
desired,  compound  classes  made  of  combinations  of  two  or  more  variables. 


In  the  foregoing, 
sion  employed  in  KEEP. 


the  term  "regression  function"  will  be  applied  to  the  expan- 
The  regression  function  D for  D is  of  the  form: 

g g 


D 

g 


Og 


*lg*l  + 


(g  = 1,  ...G)  (5-1) 


The  predictors  x^,  X2,...,XHare  selected  by  screening  (see  Chapter  3).  The 
base  constant,  Agg,  and  the  coefficients  A^  , . . . ,AMg  are  determined  by  least  squares , 
so  as  to  minimize  the  average  value  for  the^squared  discrepancy  (D  - D ) ^. 


(5-2) 


Following  is  an  analytical  proof  of  (5-2). 
Expanding  Equation  (5-1) : 

“l  “ *01  *ll’‘l'^  *2l’‘2  ■*■•••+ 

^2  “ *02  + *12*1'''  *22^2  *M2^M 


(5-3) 


*0G  *1g’‘i  ^ *2g’‘2 


*MG^M 


We  shall  prove  that  A_,  + A._  + ...  + A__  = 1 and  that  A , 
m = 1,2,  . . . , ft,  whpre'^M  is'^che  number  or  selected  predict 
to  prove  that  + Dj  + •••  + bg  * 1. 


+ A 
ors . 


m2 


+ • 

This 


. A = 0 for 
is  sufficient 


The  matrix  equation  for  generating  the  regression  coefficients  in  the  g-th 
equation  in  (5-3)  is: 


A = C'^x'b 

g g 

in  which  the  separate  terms  are  defined  as  follows: 

A is  a column  vector  with  M + 1 elements,  (A-  A,  . . . A»,  ) 
g Og  Ig  Mg 


(5-4) 


C is  a square  matrix  of  order  M + 1 consisting  of  sums,  sums  of  squares,  and 
sums  of  cross-products  of  the  predictor  variables. 


N 

EXj^ 

EXj 

. . . EXj^ 

c =* 

IXj^ 

Ex^ 

EXj^Xj 

. . . EXj^X| 

5:x^2 

...  ExJ 

5-2 


All  summations  are  from  1 to  N,  where  N is  the  number  of  cases  in  the  sample. 

X'  is  an  M + 1 by  N matrix  consisting  of  the  individual  values  of  the  predictor 
variables , 


1 

1 

...  1 

’‘11 

’‘12 

•••  ’‘in 

’‘21 

X22 

. . . X2fj 

’‘mi 

’‘m2 

• • • ’‘mn 

(5-6) 


D is  a column  vector  with  N elements  consisting  of  the  individual  values  of 
the  g-Ch  dummy  predictand. 


°g  <“lg  °2g  ' ’ • ^Ng’ 
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Equation  (5-4)  expresses  a single  set  of  regression  coefficients.  The  matrix 
equation  for  all  G sets  of  coefficients  is 


A - c"^X'D 


(5-0) 


where  A is  an  M + 1 by  G matrix  consisting  of  the  G column  vectors  A.,  A- , A„. 

Similarly,  D is  cm  N by  G matrix  consisting  of  the  G col'imn  vectors  D,,  D-,  , D_. 

Define  a column  vector,  e,  consisting  of  G elements,  each  of  which  is  unitv: 
e • (1  1 ...  1).  Post-multiplying  both  sides  of  equation  (5-8)  by  e gives 

Ae  = C-lx'De 

Note  that  Ae  gives  the  sums  that  we  require.  That  is,  the  m-th  element  of  Ae  is 
*ml  *m2  *mG'  and  m ranges  from  zero  through  M. 

Consider  now  the  right-hand  side  of  equation  (5-9).  oe  is  a column  vector  with 
N elements,  of  which  the  n-th  element  is: 


nl 


+ D - + 
n2 


+ D 


nG 


This  sum  is  identically  equal  to  unity  for  all  n,  because  one  and  only  one  of  the 
G states  (g)  must  occur.  D taUces  on  the  value  1,  while  remaining  D's  are  equal 
to  zero.  ^ 


Consider  X'De.  This  is  a column  vector  with  M x 1 elements: 

(N  Zx.  Zx_  . . . Zx„) 

This  is  precisely  the  first  column  of  the  matrix  C.  Therefore 

C“^X'De  - (1  0 0 ...  0)  (5-10) 


by  the  definition  of  the  inverse  of  a matrix.  Referring  to  equation  (5-9)  , we  see 
that 


^1  ^ 

*02  * 

^ *OG  - 1 

^ml  ^ 

A _ + . . . 
m2 

^ *mG  “ ° 

. . . . M) 

which  was  to  be 

proved  I 

(5-11) 
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Let  us  now  return  to  the  problem  of  the  reenlistees.  The  next  task  is  to 
select  the  variables. 

The  first  step  is  to  break  the  ordinary  variables  into  groups  of  dummy  variables. 
For  example,  age  is  broken  into  various  (q)  tentative  dummy  variables,  T^,  such  that: 

T,  = 1 if  age  is  less  than  17  years 

T,  = 1 if  age  17  - 20* 

T^  = 1 if  age  20  - 22 

T^  = 1 if  age  22  - 27 

T^  = 1 if  age  27-32 

Tg  = 1 if  age  is  greater  than  32  years. 

* The  upper  limit  of  each  group  is  not  inclusive  — i.e.,  16-20  means  16  to 
19.999  

The  above  dummy  variables  are  exhaustive  and  mutually  exclusive  — i.e.,  one 
must  occur,  and  no  other  will  occur  simultaneously.  The  division  of  the  ordinary 
variables  into  classes  may  be  accomplished  arbitrarily  (e.g.,  2 year  steps)  or 
intuitively  from  prior  experience  (as  in  the  above  example) . A caution  is  in  order. 
The  predictor  screening  technique  will  not  generate  finer  divisions,  so  in  starting 
out  one  should  not  combine  possibly  significant  groups.  The  screening  procedure 
will  safely  accomplish  this  task  and  account  for  most  of  the  predictive  information. 


Let  the  initial  set  of  tentative  dummy  predictors  under  consideration  be 
designated  as  T , T. , ...,  Tq.  The  screening  involves  the  computation  of  variance- 
ratio  statistics  F.'^where  an^individual  F tests  the  significance  of  an  additional 
predictor.  If  R.  denotes  the  multiple  correlation  coefficient  computed  from  k 
predictors,  and  a.f.  stands  for  the  estimated  number  of  degrees  of  freedom  left  at 
stage  k,  the  variance  ratio  F used  to  test  the  k-th  selection  is  given  by  the 
equation 


F 


. (d.  f.) 


(5-12) 


Assuming  independent  observations,  the  value  of  d.f.  for  the  k-th  selection  is 
N - k - 1,  where  N is  the  sample  size.  See  Miller  (1964)  for  a proper  level  of 
significance  for  F. 

Screening  accounts  for  all  G predictand  states  simultaneously  and  is  done  as 
follows.  Compute  a value  of  F for  each  tentative  predictor  T (q  = 1,  2,  ...»  Q)  in 

relation  to  each  dummy  predictand  D (g  = 1 , 2 , . . . , G) . At  the  first  stage  of 
predictor  selection,  there  will  be  8 x Q values  of  F (since  G values  will  be  obtain- 
ed for  each  T ) . Denote  by  x^  the  predictor  that  yields  the  largest  single  value 
of  F out  of  ail  of  these  G x Q values.  This  predictor  x^  is  called  the  first 
predictor. 


The  screening  process  is  now  repeated  to  select  a second  predictor.  For  each 
Da,  trial  multiple  correlations  using  two  predictors  are  computed.  The  two  pre- 
dictors on  any  trial  are  xl, and  one  of  the  remaining  T's.  There  will  be  G(0  - 1) 
such  multiple  correlations  with  the  same  number  of  F-values.  The  trial  predictor 
yielding  the  largest  value  of  F among  these  G(Q  - 1)  values  is  selected  as  the 
second  predictor  and  is  denoted  by  x^. 

The  screening  is  continued  to  select  third,  fourth,  and  further  predictors 
until  S predictors  x^ , x2,  ...,  x^  have  been  chosen.  As  each  predictor  is  selected, 
a statistical  test  comparing  the  highest  computed  F-value  with  a certain  critical 
value  of  F is  employed  to  decide  whether  the  proposed,  selected  predictor  appears 
to  be  useful.  The  termination  point  S is  established  by  the  fact  that  xS  passes 
this  test,  but  the  next  candidate  (which,  if  successful,  would  be  called  x^^^  fails 
it. 

The  process  just  described  is  called  forward  screening  to  distinguish  it  from 
a different,  but  related,  selection  process,  called  backward  screening.  In  back- 
ward screening,  a definite  set  of  B (B  - Q)  trial  predictors  is  chosen  to  begin 
with,  and  a regression  formula  based  on  all  B predictors  is  determined.  The  least 
important  predictor  is  then  identified  by  calculating  the  increase  in  mean  square 
error  due  to  the  omission  of  each  predictor,  in  turn,  when  the  other  B - 1 
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predictors  are  retained.  If  the  least  important  predictor  is  judged  non-significant, 
it  is  eliminated.  The-  tests  are  applied  again  to  the  remaining  set  of  B - 1 
predictors,  and  the  deletion  process  is  continued  in  a stepwise  fashion,  analogous 
to  that  used  in  forward  screening.  Because  forward  screening  can  cope  with  a much 
larger  set  of  tentative  predictors  than  can  backward  screening,  it  was  chosen  as 
the  method  to  be  utilized  in  KEEP. 

The  ordinary  formulas  for  estimating  the  sampling  varicdDility  of  regression 
constants  do  not  hold  when  the  predictors  are  required  to  meet  preliminary  tests  of 
significance,  as  they  are  in  selective  screening.  The  most  important  single  ques- 
tion to  answer  is  not  that  of  the  saunpling  behavior  of  separate  coefficients,  but 
rather  that  of  the  Scimpling  behavior  of  the  estimated  regression  function  as  a 
whole.  In  KEEP,  the  latter  question  is  attacked  by  reserving  an  independent  sample 
for  verification. 

By  a randomization  technique,  the  initial  seunple  is  divided  into  two  parts. 

One  part,  usually  the  larger,  is  called  the  developmental  sample  and  is  used  for  all 
processes  involving  setting  up  the  problem  — objective  dummying,  predictor  screen- 
ing, fitting  of  constants.  The  other  part,  called  the  verification  sample,  is  used 
solely  to  obtain  estimates  of  predictive  accuracy  when  the  regression  formulas  are 
applied  to  independent  data.  The  program  can  accept  a developmental  sample  size  of 
about  10,000  and,  if  desired,  cui  even  larger  seimple  size  for  verification. 

Predictive  performance ^is  measured  by  the  correspondence  between  and  Dg  in 
the  verification  sample.  (D  reduces  to  D if  no  adjustment  is  required  because 
probcibilities  cannot  be  less^than  zero  or  greater  than  unity)  . 

An  overall  measure  of  correspondence  between  Dg  and  Dg  is  given  by  the  mean- 
square  error,  as  defined  by  the  Brier  P-Score.  For  a single  probability  forecast 
of  G states,  the  P-Score  is  defined  as 


P-Score 


G 

= 1 
g=l 


(5-13) 


A P-Score  of  0.0  indicates  a perfect  forecast;  the  poorest  score  is  2.0,  which 
results  when  for  some  value  g,  D =1,  whereas  in  fact  there  exists  some  other 
value  g'  such  that  D , = 1.  In  Comparing  two  forecasts  of  the  same  events,  the 
lower  P-Score  indicates  the  better  forecast.  For  a series  of  N probability  fore- 
casts of  G states,  the  P-Score  is  defined  as  follows: 


1 N G . 

P-Score  = ^ E E (D  , - d . ) 

Ni.i  g.1  gi' 


(5-14) 


Bryan  and  Singer  used  two  types  of  variables  in  this  application  of  KEEP  — 
univariate  and  bivariate.*  In  one  model.  Model  A,  only  univariates  were  allowed  to 
be  selected  for  predicting  reenlistment  action,  while  in  their  second  model.  Model 
B,  both  univariates  and  bivariates  were  considered.  Because  of  the  completeness  of 
their  work,  we  can  evaluate  how  much,  if  any,  improvement  in  prediction  is  obtained 
by  including  bivariates  as  well  as  univariates. 

Of  61  dummy  variables  under  consideration  as  possible  predictors  in  Model  A, 
seven  were  selected  as  significant  by  the  screening  procedure.  Therefore,  the  KEEP 
regression  function  for  estimating  reenlistment  rate  is  of  the  form: 

°1  “ ®o  * ^l’'^  + ...  + B^x’  (5-15) 


The  selected  predictors  (designated  by  x's)  and  values  of  the  regression 
coefficients  (B's)  are  given  in  Table  5-1.  The  notation  xl  represents  the  most  sig- 
nificant predictor,  x2  the  second  most  significant  predictor,  or  more  precisely 
the  most  significant  adjunct  to  x^,  and  so  on  in  order  of  superscript,  so  that 
represents  the  seventh  most  significant. 

* univariates  are  used  here  to  mean  dummy  variable  predictors  while  bivarlates  are 
joint  dummy  predictors. 
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TABLE  5-1 


TERMS  IN  FOR  MODEL  A 


Predictor  Symbol 

Additive 

Constant 

xV 

x2 

x3 

x4 

x7 

Regression 

Symbol 

B 

Q 

®3 

®4 

®5 

»6 

®7 

Value 

.272 

-.063 

.292 

.075 

.059 

.281 

-.047 

.082 

To  use  the  data  in  Table  5-1  for  deriving  a reenlistment  rate  for  any  given 
individual  is  a relatively  simple  matter.  The  coefficients  of  the  characteristics 
that  pertain  to  the  individual  are  added  to  the  baseline,  the  "additive  constant". 

For  example,  if  none  of  the  seven  selected  categories  pertain  to  an  individual, 
then  his  predicted  reenlistment  rate  is  simply  .272,  which  is  a little  above  average. 
If  categories  1,  2,  and  4 pertain,  then  his  predicted  probability  is  .560  (.272  - 
.063  + .292  + .059),  which  is  quite  high. 

The  Brier  P-Score  for  the  developmental  sample  (N=6372)  was  .3703;  that  for  the 
verification  sample  (N=703)  was  .3707.  Hence,  overall  predictive  performance,  as 
measured  by  the  Brier  P-Score,  was  very  nearly  the  same  on  the  verification  sample 
as  on  the  developmental  sample,  thus  lending  credence  to  the  method.  If  the 
population  itself  were  to  undergo  basic  changes  in  the  relationships  among  the 
variables,  the  present  regression  function  should  not  be  expected  to  apply. 

In  Model  B,  both  univariate  and  bivariate  predictors  were  made  available  for 
selection.  The  results  using  Model  B parallel  those  of  Model  A. 

Of  214  dummy  variables  (61  univariates  and  153  bivariates)  considered  as 
possible  predictors  in  Model  B,  seven  were  selected  as  significant  by  the  screening 
procedure.  The  REEP  regression  function  for  estimating  reenlistment  rate  thus 
reduced  to  the  same  form  as  in  Model  A. 

Dj^  = + Cj^y^  + ...  + (5-16) 

Both  the  selected  predictors  and  the  regression  coefficients  differed  from  those 
derived  in  Model  A.  The  selected  predictors  (designated  by  y's)  and  values  of  the 
coefficients  (C's)  are  shown  in  Tcible  2.  As  in  Model  A,  the  subscripts  indicate 
the  rank  order  of  selection. 


TABLE  5-2 

TERMS  IN  FOR  MODEL  B 


Predictor  Symbol  | 

^1 

^2 

^3 

^4 

^5 

^6 

^7 

Regression 

Coefficients 

Symbol 

<^0 

^1 

^2 

^4 

S 

<^6 

^7 

Value 

.257 

-.068 

. 39  3 

-.053 

.096 

. 338 

.056 

.071 

The  Brier  P-Score  for  the  developmental  sample  of  Model  B (N  « 6372)  was 
0.3687;  that  for  the  verification  sample  (N  • 703)  was  0.3694.  The  indicated 
superiority  of  Model  B over  Model  A,  shown  by  the  slightly  (but  statistically 
signif icamtly)  lower  P-Score  of  the  former  in  the  developmental  san^le,  was  verified 
in  the  verification  s^unple. 

Bryan  and  Singer  showed  that  both  models  yield  valid  estimates  of  reenlistment 
prob^d)ility,  but  that  Model  B had  greater  capacity  for  sorting  out  departures  from 
average.  On  comparing  the  predictors  selected  under  the  two  models,  it  was  found 
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that  six  of  the  seven  selected  predictors  were  closely  related. 

A study,  such  as  that  performed  by  Bryan  and  Singer,  will  remain  valid  as  long 
as  there  are  no  significant  changes  in  either  the  predictors  or  their  relative 
weight.  Consequently,  the  use  of  such  a scheme  should  employ  frequent  verification 
as  an  indicator  of  the  need  to  update  the  model. 


! 
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' CANONICAL  CORRLUTION  APPLICATIONS 

by  Lt  Jeanette  M.  Heumann 


INTRODUCTION 

The  key  to  marketing  suiress  lies  in  matching  a buyer  with  a product  which  will  fill  his  or  her 
needs.  A buyer  study  provides  a concrete  means  of  diagnosing  the  buyer-product  relationship.  A 
buyer  study  will  statistically  relate  buyer  characteristics  to  product  characteristics  so  Chat  for 
a given  buyer,  a probability  distribution  may  be  predicted  as  to  the  size  and  type  of  that  buyer's 
purchase.  A study  such  as  this  which  examines  buyer  characteristics  versus  those  of  a particular 
product,  is  an  Important  management  Information  tool.  An  in  depth  statistical  analysis  of  this 
relationship  can  provide  a basis  for  company  goal  setting  and  planning.  It  will  point  out  a 
company's  strengths  and  weaknesses  in  the  area  of  their  marketing  operations.  A buyer  study  should 
examine  several  areas.  It  should  tell  who  your  buyers  are  and  what  products  they  are  likely  to  buy. 
Buyers  may  be  identified  by  variables  such  as  buyer  sex,  age,  income,  marital  status,  number  of 
dependents,  and  occupation.  A buyer  study  also  allows  a company  to  evaluate  its  product  performance 
as  compared  to  peer  companies'  product  performance.  In  addition,  a buyer  study  will  identify  a 
company's  principal  markets.  Markets  are  defined  as  groups  of  individuals  who  possess  homogeneous 
characteristics  such  as  marital  status,  occupation.  Income,  or  age.  Finally,  a company  buyer  study 
will  clearly  indicate,  by  means  of  a statistical  study,  what  is  sold  to  a company's  principal 
markets. 

Buyer  studies  have  long  been  used  by  Insurance  companies  to  evaluate  new  markets  or  territories. 
In  selective  advertising,  in  persistency  evaluation,  in  quota  setting,  in  appraising  new  product 
potential,  and  in  assisting  new  agents.  The  means  by  which  a set  of  buyer  characteristics  or 
variables  are  related  to  a set  of  product  variables  is  through  the  use  of  statistical  procedures 
such  as  canonical  correlation.  This  statistical  procedure  was  initially  developed  by  Hotelling 
(1935). 

Insurance  companies  can  collect  "families"  of  information  about  buyers  (sex.  Income,  age,... 
etc.).  Canonical  correlation  methods  treat  these  "families"  of  Information  as  separate  entities 
and  yet  allow  the  probability  of  occurrence  of  one  type  of  family  Information  to  be  calculated 
from  another  family  of  information. 

This  paper  will  examine  work  done  by  Dr.  R.  C.  Miller  for  the  Life  Insurance  Marketing  and 
Research  Association  (LIMRA)  which  discusses  the  buyer-product  telationshlp  as  applied  to  the 
insurance  Industry.  This  work  relates  buyer  variables  such  as:  Marital  status 

Age 

Resident  state 

Income 

Occupation 

Age  and  Income  combined 

Age,  Income,  and  sex  combined 

Income  and  occupation  combined 

to  product  variables  such  as; 

Mode  of  payment 
Type  policy 
Amount  of  policy 
Policy  premiums 

Type  of  policy  combined  with  mode 
of  payment 

Approximately  twenty  thousand  United  States  ordinary  policies  sold  to  adults  in  1970  were  used  in 
analyzing  the  probability  of  a particular  individual  buying  one  of  a number  of  policies  which 
differed  in  size  and  type. 

Ter—  and  Mathematical  Symbols 


The  following  is  a list  of  terms  and  symbols  which  will  be  used  extensively  in  this  paper 
and  with  which  the  reader  should  be  familiar. 


1.  M - Underlining  signifies  a matrix 

2.  Xp-  Variable  describing  a product  characteristic 

3.  V - Variable  describing  a buyer  characteristic 

q 

i a -coefficient  of  x 
• tp 

5.  bjq-coefflclent  of  y 

6.  a and  b - two  sets  of  weights  which  maximize  the  correlation  between  the  derived 

variates 

1 2 . . P 12... 


where  a= 


I “11 

2 ’ a 


22 


and  b= 


1 Pll 


canonical 

q 


t 


L 


7_  Vj,  V2,  ....  Vf  - Linear  functions,  where  each  V represents  a linear  combination  of  x variables 

8.  Wj^,  W2,  ....  Wj  - Linear  functions,  where  each  W represents  a linear  combination  of  y variables 

9.  R - Correlation  matrix  where  R • ! | Rj^2  j 

i i21  I 5^22  ; 

10.  Rj,  - The  intercorrelation  among  the  x's 

11.  ^2  ■ T*’®  intercorrelation  among  y's 

12.  ^2  " ^^®  intercorrelation  of  x's  and  y's 

13.  R27  - The  transpose  of  R^2 

14.  X - Lagrange  multiplier 

15.  A - Wllk's  criterion,  a likelihood-ratio  criterion 

16.  SSCP  Matrix  (S)  - The  sums  of  squares  and  cross  products  matrix;  see  Tatsuoka  (1971) 

17.  WLLP  - A whole  life  - limited  pay  insurance  policy 

18.  WLCP  - A whole  life-continuous  pay  Insurance  policy 

19.  MODL  - A modified  life  Insurance  policy 

20.  ENDR  - An  endowment  and  retirement  insurance  policy 

21.  LEVT  - Level  term  Insurance  policy,  DECT  - Decreasing  term  insurance  policy 

22.  COMB  - A combination  policy 

23.  A posteriori  probabilities  - Conditional  probabilities  over  the  states  of  nature,  or 

predictand  groups  for  given  predictor  values. 

24.  A priori  probabilities  - Unconditional  probabilities  over  predictand  groups  of  the  states  of 

nature 

2 

25.  Mahalanobis  D - A statistic  utilized  for  testing  the  significance  of  P variates  to  discriminate 

among  two  groups  which  have  equal  or  unequal  dispersion  but,  different  means. 

26.  Unit  matrix  - A matrix  which  is  synssetric  and  has  diagonal  elements  equal  to  unity  while  all 

other  elements  equal  zero 
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Technical  Details 


As  was  mentioned  earlier,  the  particular  insurance  case  study  being  examined  used  approximately 
20,000  ordinary  policies  sold  in  the  United  States  In  1970.  The  study  excluded  from  analysis  the 
following  types  of  policies: 


Policies  on  lives  of  residents  of  U.S.  territories, 
Canada,  and  other  foreign  countries. 

Acquired  reinsurance  cases. 

Individual  credit  Insurance, 

Group  Insurance, 

Annuities  without  Insurance, 

Terra  conversions. 

Single  or  double  premium  policies. 


By  examining  the  nonexcluded  policies,  it  was  possible  to  determine  two  sets  of  variables.  The  first 

set  of  variables  (x's)  described  product  characteristics  and  made  up  the  set  of  predlctand  variables. 

The  second  set  of  variables  described  buyer  characteristics  and  made  up  the  set  of  predictor 
variables.  The  Interrelations  between  these  two  sets  of  measurements  can  be  studied  by  canonical 
correlation  models.  This  statistical  method  provides  the  maximum  correlation  between  the  linear 
functions  of  the  two  sets  of  variables.  Each  pair  of  these  linear  functions  is  determined  so 

that  the  correlation  between  the  new  pair  of  canonical  variates  Is  maximized  where  the  maximiza- 

tion of  correlation  Is  subject  to  the  limitation  that  they  are  independent  of  linear  combinations 
derived  previously. 


The  general  nature  of  canonical  correlation  may  be  best  explained  using  algebraic  means. 
First,  consider  two  simultaneous  sets  of  t equations  that  contain  p predlctand  and  q predictor 
variables.  From  the  two  sets  of  equations  as  follows: 

"Left  Side"  (Predlctand  Eauations) 


= ail*!  + + •••  + 

y2  = + a22X2  + ...  + a2pXp 


at^Xi  + 


^t2’'2  ^ 


W 

w 


1 

2 


"Right  Side" 
biiYi  + b^2y2  " ••• 
^21^1  * ^22^2  ^ 


(Predictor  Equations) 


^ '’2q>'q 


"t  • ‘^tiyi  ^ ^t2y2  + • ■ • + b^qVq 
(where  t » min  (p,q)  ) 


We  can  then  determine  the  sets  of  weights  of  a and  b so  that  the  correlation  between  Vj  and  Wj  is 
higher  than  that  of  any  pair  of  linear  functions  of” the  x's  and  the  y's.  Likewise  V2  and  #2  are 
correlated  higher  than  any  pair  of  simultaneous  equations  other  than  Vj  and  Wj.  This  hierarchy 
of  correlation  is  maintained  all  the  way  down  to  the  and  equations.  For  this  relationship 
to  exist,  the  weights  of  a and  b must  be  determined  so  as  to  maximize  the  relationship  between 
the  derived  canonical  variates  V and  W,  It  should  be  noted  that  the  special  case  where  q>l  and 
p*l  is  a case  of  multiple  regression  and  not  one  of  canonical  correlation.  Canonical  correlation 
requires  that  there  be  both  multiple  predictors  and  multiple  predictands  involved.  The  size  of 
p or  q determines  the  number  of  linear  combinations  that  can  be  formed.  If  q is  smaller  than 
p then,  there  will  be  q linear  combinations  formed.  Likewise  if  p is  smaller  than  q,  then  p 
linear  combinations  will  be  formed.  Each  pair  of  the  canonical  variates  V and  W will  have  maximum 
correlation,  taking  into  account  the  restrection  that  each  canonical  variate  (V^  or  Wi)  is 
orthogonal  to  all  other  canonical  variates  on  its  side  of  the  equation. 

In  geometric  terms  we  can  consider  canonical  correlation  as  a measure  of  the  extent  to  which 
individuals  can  occupy  the  relative  positions  in  the  p dimensional  space  as  they  do  in  q dimen- 
sional space.  Considering  buyer-predictor  variables  versus  the  Insurance  product-predictand 
variables  might  appear  to  possess  little  or  no  similarity  when  the  variables  are  compared  scale 
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for  scale.  However,  canonical  correlation  methods  readily  show  the  system  of  correlation  which 
underlies  the  two  sets  of  variables.  The  first  step  in  the  analysis  of  the  canonical  correlation 
between  the  set  of  buyer  variables  and  the  insurance  product  variables  is  the  solving  of  a^  and 
This  requires  the  correlation  matrix  Where  ^ equals: 


As  can  be  readily  seen  from  the  above  matrix,  ^ is  itself  divided  into  four  submatrices.  The  four 
submatrices  are  obtained  in  the  following  manner: 


R is  the  correlation  between  the  x's 


Note:  This  matrix  is  represented  as  an  upper  right  triangle  matrix  due  to  the  fact  that  the 
x-x  correlations  composing  the  lower  left  triangle  of  the  matrix  are  simply  mirror  images  of  the 
upper  right  triangle  correlations. 


R^22  l^he  correlation  among  the  y's 


Again,  the  ^2  “atrix  (like  the  matrix)  may  be  displayed  as  an  upper  right  triangle  matrix, 
is  the  correlation  between  the  x's  and  the  y's 


i 
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That  is 


And  finally,  Rj2  which  is  simply  the  transpose  of  R2 


^1 

>^2 

^3 

*1 

! 

S'^i 

\’y2 

%-y3 

^i-y, 

*2 

S-^i 

%.y2 

^■^3  ••• 

"x2.y 

*3 

S.^i 

’'3*^2 

• S-y, 

i 

X 

P 

r 

X »y.  . . . 
P 5 

r 

The  partitioned  portions  of  R may  then  be  substituted  Into  the  canonical  equation 

Mi2  ^22^  £21  " = 0. 

NOTE:  the  above  equation  may  also  be  written  as 

'52^'  £21  £u'  £12  - 

The  equation  for  may  be  derived  through  the  following  procedure: 

1.  Let  Cj  “ Xj  (where  1-1,  2, ...min  p,q). 

2.  Since  we  are  looking  for  linear  functions  that  have  maximum  correlation  and  since  the  correla- 
tion of  a multiple  of  V and  a multiple  of  W is  the  same  as  the  correlation  of  ^ and  W we  can 
make  an  arbitrary  normalization  of  a and  b. 


This  is  done  euch  that: 


1=  = a'Rji  a 

1-  = b'R22  b . 


3-  5 - ^’522  b - 1. 

4.  EVW  ■ •'5i2  k • 

5.  Since  the  algebraic  problem  is  to  maximize  (4)  subject  to  (3), 

let 

r(a,b)  -(a'Rj^j  &) -|l/2))(a’ R^  a-1) -fr/2v^(b  • R22  b-1) 

where  X and  u are  Lagrange  multipliers  and  where  the  factors 
1/2  are  introduced  for  numerical  convenience. 

6.  Then  if  we  differentiate  F with  respect  to  a and  b and  set  the 
vectors  of  the  derivatives  equal  to  zero,  we  have: 

3F  ■ Rn  b-XR. , a » 0 

9a  " 

3 F * —21  * *^”22  ^ ® 

b 

multiply  jp.  by  a',  and  3F  by  b' 

9a  9B 

so  that,  a'  Rj^j  ^ ° 

k'  521-  ” £22  & “ 

7.  Since  we  know  that  «'5n  * “ 3,  and  b*R22b  ” 1,  we  can  easily 

see  that  .,1 

>-  U - • 5i2S  • 


6-S 


8.  Thus  we  can  write:  a)  B.12  — j ® 

b)  ^ R22  b.  = 0 

9.  We  can  then  derive  a single  matrix  equation  for  a^^  or  b.  if  we  multiply  8a  by  and  8b  by  ^22" 

a)  X.  Rjjb;  = xf  a. 

b)  R2i  ^21  ^ = S ^i 


10.  If  we  then  substitue  from  9a  into  9b... 

a)  Ri2  £22  £21  5,  - Rll  ii  = ° 

b)  (Ri2  £22  £21  - ^i  £lP£i  = ° 

c)  (R^l  R,2  £22  £21  - ^i  i^^i  = ° 


Where  |Rjj  Rj^  ^22  £21  ” i 1 ® -1  ’ ' ' • p satisfy  equation  10a  for  X^  = Xj,....Ap  respec- 

2 2 2 

tively.  The  similar  equations  for  bj,...,^  occur  when  X^  = Xj,  ...X^  are  substituted  with 

(£2!  £21  £il  £12  - i)£i  * ° 

Note:  the  vector  b.  may  also  be  obtained  from  the  equation 

bi  = (R22  £21  ^i)/\ 


„-l 


The  vectors  and  are  then  applied  to  standard  score  vectors  to  obtain  the  canonical 
variates  V and  W.  pie  canonical  correlation  (R  ) between  the  ith  pair  of  new  composites  is  equal  to 
X..  The  largest  X.  is  the  square  of  the  maximSm  possible  correlation  between  the  linear  combina- 
tions of  the  two  sits  of  measurements  (R^  max  = X^) . Also,  if  it  is  desired  to  find  the  coefficients 
of  the  observed  deviation  scores,  they  can  be  obtained  by  dividing  the  elements  and  b.  by  the 
standard  deviation  of  the  corresponding  variables. 

An  alternate  but  similar  procedure  for  finding  the  canonical  correlation  between  two  sets  of 
variables,  in  this  case  buyer  variables  and  product  variables,  is  discussed  by  Tatsuoka  in  his  book. 
Multivariate  Analysis:  Techniques  for  Educational  and  Psychological  Research.  Tatsuoka  considers 
two  sets  of  variables  which  each  construct  a linear  combination: 


V = ajXj 

♦ a^x^  ., 

..ax 

P P 

“ * '’1^1 

* V2  • 

..by 

q q 

From  these  linear  functions,  we  must  determine  the  two  sets  of  coefficients  a'  “a,,  a.,  . . . ap  and 
b'  • b,,  b , ...  b so  as  to  maximize  correlation  between  the  two  linear  continations . In  order  to 
Zo  this  we^must  express  the  correlation  between  V and  W as  a function  of  £ and 


We  may  express  the  quantities 


-evw/Ksv^) 


and  W^  as  quadratic  forms  in  the  following  manner: 
TV  ^ . a'  S a 

— -XX— 

EW^  . b'  S b 

- -yy- 


The  quantities  S and  S represent  the  Sums  of  Squares  and  Cross  Products  Matrices  (SSCP) . This 
type  matrix  is  mefitioneH^n  the  Terms  section  of  this  paper  and  explained  fully  by  Tatsuoka  (1971). 


We  can  also  show  that:  T VW  • b.  Where  in  this  case  S represents  the  pxq  matrix  of  the 

sums-of-products  between  the  x variibTiS  and  the  y variables.  UsfXg  the  above  equalities,  the 
formula  for  r^  may  be  written  in  the  following  manner: 

r • 8*S  b/[(a*S  a)  (b’S  b)]** 

vw xy  - ‘ ^xx—  * 


The  maximizing  weights  of  ^ and  b are  determined  only  up  to  proportionality  constants.  This  is  done 
because,  if  n and  u are  two  arbitrary  constants  of  the  same  sign,  the  value  of  the  correlation 
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between  V and  W obtained  by  using  the  elements  of  and  vk  as  combining  weights  is  seen  to  be 
equal  to  the  value  which  would  result  from  the  use  of  a^  and  b^  as  the  weights. 

The  proportionality  constants  can  be  chosen  so  that: 


This  results  In  the  denominator  of  a 
If  Lagrange  multipliers  X/2  and 


■s  b/[(a'S;tx5j  (b'S 

p/2  are  introduced  at  thi 


bj  ] bein; 
this  time 


; equal  to  unity. 

the  factors  1/2  are  intro- 


duced merely  for  numerical  convenience),  the  the  function  which  is  to  be  maximized  is 


F(a,b)  = a’S^yb-{X/2)  (a'Sj^^a-l)-(u/2)  (b'Syyb-1)  . 


The  next  step  Involves  taking  the  symbolic  partial  derivatives  of  F (a,y  with  respect  to  a and 

The  two  resulting  equations  are  then  each  set  equal  to  the  null  vector.  This  gives  the  following 
equations: 


— -xy_  _xx_ 


3b' 


S'^xv-bb'S 


0 


which  are  the  equations  that  must  be  satisfied  by  £ and  ^ in  order  to  maximize  the  correlation  co- 
efficient r^.  The  above  two  equations  constitute  sufficient  conditions  for  the  desired  maximiza- 
tion. 

I 

The  next  step  is  to  premultiply  the  members  of  3F/3a  by  a and  to  postmultiply  the  members 
of  3F/3b'  by  b as  follows: 

5’Sxyb-Ma'S^^a)  - 0 
a’S^yb-b(b'Syyb)  . 0 

From  the  preceding  two  equations  the  relationship  of  « X(£'S  a)  « u(b’S  y can  easily 

be  seen.  This  relationship  reduces  to  the  form  r ~ YY 

a'S  b «X  -p  by  recalling  that  a'S  a » b'S  b « 1 
^xy—  ® XX— YY— 


This  clearly  shows  th.it  both  eigenvalues  X and  P are  equal  to  the  maximum  value  that  can  be 
achieved  by  the  correlation  coefficient  r^.  Since  X “ p we  may  replace  X by  p so  that: 


S b-uS  a “ 0 
-xy-  ‘^-xx- 

S b •>  uS  a 
-xy-  ^-xx- 

and  S a-uS  b • 0 
-yx- 
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If  we  then  assume  Syy  to  he  nonsingular,  we  may  express  ^ in  terms  of  ^ as  follows: 

b = (l/p)S-l  S a. 

- ' ^ ~yy  -yx- 

This  expression  may  then  be  substituted  for 


If  we  then  premultiply  both  members  of  the  above  equation  by  p we  obtain  the  following 

expressions : 


S”^  S S ^ S a = y2  la 
-XX  -xy  -yy  -yx-  ^ -- 

s”^  S s”^  S a-  u2  I a = 0 
-XX  -xy  -yy  -yx-  

and  in  final  form,  -1  -1  2 n 

fSxx  S^y  Syy  Sy^  - u I)a  = 0. 


From  this  equation  it  can  be  seen  that  the  largest  eigenvalue  ♦ of  the  quadruple  matrix  product 

S ^xx  ^ ^”^yy  ^vx  Slves  the  square  of  the  maximum  correlation  coefficient.  It  can  also  be 

seen  that^the  elements  of  the  associated  eigenvector  a^  provide  the  weights  by  which  the  x set  of 
variables  need  to  be  combined  linearly  to  achieve  this  maximum  correlation.  The  other  combining 
weight  vector  may  be  easily  obtained  by  substituting  aj  and  Pj,  in  the  equation 

b,  = (I/pJS'^  S a,. 

-1  1 -yy  -yx  -i 

It  is  obvious  then  that  other  eigenvalues  and  vectors  besides  and  may  be  obtained  using 

the  general  equations  Just  developed.  These  equations  may  be  utilized  to  determine  weights  a 
and  ^ when  two  sets  of  linear  functions  Vj,V2,  ...V^  and  Wj^  ,W2 , . . .Wj.  are  present.  We  start  out  by 
finding  two  sets  of  combining  weights  that  will  maximize  the  specified  correlation  coefficient 
for  the  resulting  pair  of  linear  combinations.  The  elements  of  the  vector  which  are  associated 
with  the  largest  eigenvalue  of  a certain  matrix  make  up  the  weights  which  lead  to  the  desired 
absolute  maximum.  This  also  holds  true  of  the  elements  of  the  vectors  associated  with  the  second, 
third,  and  final  eigenvalues  In  descending  order  of  magnitude  In  the  sense  that  the  pair  of 

linear  combinations  formed  by  the  elements  of  the  second  vector  has  the  largest  value  of  the  rele- 

vant criterion  among  those  that  are  not  correlated  with  the  first  pair  of  linear  combinations  and 
80  forth.  For  any  case  of  canonical  correlation  analysis,  the  linear  combinations  occur  In  two 
sequences,  one  sequence  for  each  of  the  two  sets  of  variables.  There  Is  no  correlation  within  each 
sequence  and  between  unmatched  pairs  of  linear  combinations  across  the  two  sequences.  Therefore, 
not  only  Is  Vj  uncorrelated  with  V2>'^3,...or  V^,lt  Is  also  uncorrelated  with  W2,Wj,  ...  or 
It  should  be  clear  then  that  the  only  nonzero  correlations  occur  between  the  corresponding  members 
of  the  paired  linear  combinations  such  as  and  Wi.  ot  V2  and  W2.  Canonical  variates  is  the  term 
used  to  denote  pairs  of  linear  orthogonal  combinations.  The  number  of  canonical  variate  pairs 
will  equal  the  number  of  variables  In  the  smaller  variable  set,  that  Is  the  smaller  of  the  two 
numbers  p or  q.  This  Is  due  to  the  fact  that  the  rank  of  £~^xx  ^yxi  the  quadruple  product 

matrix,  whose  eigenvectors  and  eigenvalues  determine  the  canonical  variates.  Is  equal  to  q or  p 
whichever  Is  the  smaller. 


An  application  of  Bayes  Theorem  Is  utilized  to  evaluate  a poeterior'i  probabilities. 
Specifically,  this  means  that  all  of  the  possible  "Left  Side"  combinations  (L)  will  be  enumerated 
for  which  It  Is  desired  to  obtain  probabilities.  There  will  be  I i 2"  "Left  Side"  combinations. 


for  which  It  Is  desired  to  obtain  probabilities.  There  will  be  L 
This  may  be  performed  for  the  set  of  "Right  Side"  conditions  of 
the  following  equation: 


P («.|WJ 


f (Wll)g, 


”2'“3’ 


Wf  by  utilizing 


t f 

t-  1 


(where  I • 1,2,3,. .. ,L) 
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and  is  the  a priori  probability  of  t.  In  this  equation  f (w|l) 
normal  for  all  L combinations  with  equal  covariance  matrix  or 


f(w|e) 


(2ti) 


r ^ ' -1 

- exD  |-ij(w-w  ) Z (w-w  ) 


is  assumed  to  be  multivariate 


where  1 = 1,  2,  3,  ...,L  and  w ^ , the  vector  of  values  that  are  expected  to  be  predicted  from  the 
"Left  Side"  for  the  combination.  The  diagonal  matrix  ^ has  elements  whose  values  represent  the 
error  variances  between  the  pairs  of  linear  combinations  Vj  and  Wj , V2  and  W2 , Vj  and  W3,..,  V»  and 
Wf.  If  on  the  other  hand,  we  desire  to  find  the  probabilities  for  a set  of  "Right  Side"  R combina- 
tions, we  can  do  so  in  the  following  manner:  There  will  be  R^P  "Right  Side"  combinations.  To  per- 
form this  for  the  set  of  "Left  Side"  conditions  of  v = Vj,V2,...,v^  by  utilizing  the  following 
equation 

f(v 

p(k|v)  = ^ 

Rilf(V|k)g, 

where  k=l,2,3 R and  gg  is  the  a priori  probability  of  k,  and  f(v|k)  is  again  assumed  to  be  multi 

variate  normal  for  all  R combinations  with  equal  covariance  matrix  £ or 


f(v|k)  = 


exp 


where  k = 1,2,3,...,R,  and  ^ represents  tne  vector  of  expected  values  predicted  from  the 
" Left  Side"  for  combination  k.  In  the  event  that  multivariate  normality  cannot  be  assumed.  Fix 
and  Hodges  (1951)  utilize  a nonparametric  procedure  which  should  be  examined. 


Significance  tests  can  be  used  to  decide  whether  a significant  linear  relationship  exists  be- 
tween the  two  sets  of  variables.  It  should  be  noted  that,  since  discriminant  analysis  may  be  re- 
garded as  .1  special  case  of  canonical  correlation,  significance  tests  for  canonical  variate  pairs 
will  closely  resemble  those  employed  for  discriminant  analysis.  The  first  step  in  describing  the 
procedure  used  in  overall  significance  testing  is  to  define  Wilk's  A criterion. 

A = .3  (1-X^) 

where  q < p and  where  A is  a statistic  which  is  inversely  related  to  the  strength  of  relationship: 
the  smaller  the  value  of  A,  the  greater  the  relationship  strength.  After  A has  been  computed  from 
the  previous  equation,  an  overall  significance  test  may  be  carried  out  on  the  canonical  variate 
pairs  by  using  the  chi-square  approximation  (x^) • This  approximation  for  the  distribution  of  A 
will  provide  a test  for  the  null  hypothesis  that  p variates  are  not  related  to  the  q variates.  The 
following  equality  relates  to  A: 

X^  - [N- . 5(p+q+l)  ] In  A 


with  pg  degrees  of  freedom  and  where  N is  the  total  sample  size.  If  we  reject  the  null  hypothesis, 
we  can  remove  the  contribution  of  the  first  root  of  A and  then  test  the  significance  of  q-1  roots 
as  follows:  IP.H)  j 

A'  = IT  (i-xn, 
i*2  ^ 

2 

X * - [N- .5 (p+q+ 1) ] In  A'  which  has  (p-1)  (q-1)  degrees  of  freedom. 


The  general  equation  used  for  r roots  remjjiy^dj^s^j 


n 

i»r+l 


(1-Aj) 


Where  x is  distributed  so  that  there  are  (p-r)  (q-r)  degrees  of  freedom. 

2 

It  was  originally  thought  that  only  the  quantity  Xj  and  the  corresponding  canonical  correlation 
R "X.  were  of  any  jnterest . Further  examination  has  shown  that,  depending  upon  the  research  question, 
root*  other  than  X may  be  relevant.  It  has  been  found  that  one  or  more  subsets  of  the  predictor 
variables  may  be  related  to  one  or  more  of  the  respective  subsets  of  the  criterion  or  predictand 
variables.  The  combination  of  variables  in  the  predictor  set  y that  are  related  to  a predictand  sub- 
set in  X can  be  determined  if  we  inspect  the  elements  of  the  two  vectors  a.  and  b,  which  are  associ- 
ated with  the  quantity  X,.  We  know  that  each  X.  will  be  equal  to  the  corrllation^between  the  linear 
functions  of  the  right  aAd  left  variables,  whicA  are  formed  by  using  b.  and  a.  respectively.  The 
chi-square  approximation  tests  that  have  been  defined  will  show  how  ma^y  of  tAe  functions  allow 
statistical  interpretation. 


As  previously  discussed,  the  insurance  case  study  under  examination  involved  a sample  size 
of  approximately  20,000  U.  S.  policies.  This  particular  case  study  utilized  the  following  two  sets 
of  variables : 
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'Left  Side"  Variables 


»■, 


• 1 If  policy  whole  life  continuous  pay,  0 otherwise. 

X2  • 1 If  policy  whole  life  limited  pay,  0 otherwise. 

Xj  • 1 If  policy  modified  life,  0 otherwise. 

X^  • 1 If  policy  endowment  or  retirement.  0 otherwise. 

Xj  • 1 if  policy  level  term,  0 otherwise. 

X^  “ 1 If  policy  decreasing  term,  0 otherwise. 

Xy  “ 1 If  policy  family  plan  or  combination,  0 otherwise. 

Xg  • 1 if  policy  others  with  term,  0 otherwise. 

Xg  » 1 if  policy  >$50,000.,  0 otherwise. 

X jg  =•  1 if  policy  $25,001-$50,000.  , 0 otherwise. 

X jj  « 1 if  policy  $10,001-$25,000. , 0 otherwise. 

X 12  “ 1 if  policy  $10,000.,  0 otherwise. 

Xjj»  1 if  policy  less  than  $10,000.,  0 otherwise. 

"Right  Side"  Variables 


Y « 1 if  male,  0 otherwise 

Y 2 “ 1 if  female,  0 otherwise. 

Y j - 1 if  15-19  years  old,  0 otherwise. 

Y » 1 if  20-2A  years  old,  0 otherwise. 

Y g • 1 if  25-29  years  old,  0 otherwise. 

Y g ” 1 if  30-39  years  old,  0 otherwise. 

Y 2 “ 1 if  single,  0 otherwise. 

Y g ■ 1 if  married,  0 otherwise. 

Y g • 1 if  divorced,  widowed,  separated,  0 otherwise. 

Y jg*  1 if  income  <$3000.,  0 otherwise. 

Y 1 if  income  $3000-9999.,  0 otherwise. 

Y J2“  1 If  income  $5000-7499. , 0 otherwise. 

Y jj-  1 if  Income  $7500-9999,  0 otherwise. 

Y 1 if  Income  $10,000-24,999.,  0 otherwise. 

Y 15-  1 if  Income  $25,000.  or  over,  0 otherwise. 

Y 16-  1 if  not  gainfully  employed,  0 otherwise. 


Y 

Y 

Y 

Y 

y 


27“  1 if  occupation  professional,  0 otherwise, 

2g“  1 If  occupation  semiprofessional,  0 otherwise. 

29-  1 If  Student,  0 otherwise. 

20*  I if  housewife,  0 otherwise. 

2\"  i if  sll  others,  0 otherwise. 
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The  sample  under  examination  allowed  the  following  a priori  probabilities  to  be  estimated: 


I 


EgriMAIED  A PRIORI  DISTRIBUTION 


POLICY  AMOUNT  VS  POLICY  TYPE 


WLCP 

WLLP 

MODL 

BHDR 

LEVT 

DECT 

COMB 

OTHR 

TO'TAL 

> 50K 

0.01 

0.00 

0.00 

0.00 

0.00 

0.01 

0.00 

0.01 

0.02 

25-50K 

0.02 

0,00 

0.00 

0.01 

0.01 

0.02 

0.01 

0.01 

0.09 

10-25K 

0.05 

0.02 

0.01 

0.02 

0.02 

0.03 

0.04 

0.02 

0.20 

lOK 

0.10 

0.04 

0.02 

0.03 

0.01 

0.01 

0.01 

0.03 

0.23 

< lOK 

0.20 

0.15 

0.03 

0.04 

0.00 

0.00 

0.00 

0.04 

0.46 

TOTAL 

0.37 

0.21 

o.o6 

0.09 

0.04 

0.06 

0.07 

0.10 

1.00 

Three  examples  obtained  by  applying  the  results  of  the  analysis  are  presented  as  follows: 

Example  1 Buyer  - The  buyer  Is  a single  female,  40  or  older,  with  an  Income  of  $3,000-4999.  and  she 
Is  considered  to  be  a semiprofessional. 


KSTIMATKD  PROHAHTI.TTTE.q 


POLICY  AHOimr  vs  policy  type 


WLCP 

WLLP 

MODL 

ENDR 

LEVT 

DECT 

COMB 

OTHR 

TOTAL 

>50K 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

25.5OK 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.01 

10-2SK 

0.01 

0.01 

0.00 

0.00 

0.00 

0.00 

0.01 

0.00 

0.03 

\r< 

0.03 

0.03 

0.01 

0.01 

0.00 

0.00 

0.01 

0.01 

0.09 

<10K 

0.25 

0.44 

0.03 

0.05 

0.00 

0.00 

3.01 

0.04 

0.87 

TOTAL 

3.  P"! 

■>.47 

0. 10 

3. 06 

0.00 

0.01 

0.03 

0.05 

1.00 

In  the  above  example,  the  predicted  conditional  distribution  clearly  Indicates  several  things.  It 
indicates  that  there  Is  only  a 91  probability  of  this  buyer  purchasing  a term  type  policy  or  any 
policy  combination  involving  term.  This  example  also  Indicates  that  a buyer  within  this  category 
shows  only  a 13Z  probability  of  purchasing  any  policy  greater  than  $10,000.  In  fact,  the  modal 
value  of  the  distribution  Is  44Z  that,  if  a buyer  in  this  category  buys,  it  will  be  a whole  life- 
limited  pay  policy  under  $10,000. 

Example  2 Buyer  - This  Is  a married  male  buyer  between  twenty  and  twenty  four  years  of  age.  This 
buyer  has  an  annual  Income  of  $25,000.  or  greater  and  Is  not  considered  a professional  or  a semi- 
professional. 

ESTLIATED  PRQBAbll.TTIE.S 


POLICY  AlOOMT  VC  POLICY  TYPE 


WLCP 

WLLP 

MODL 

EHDR 

LEVT 

DECT 

caiB 

CTHR 

TOTAL 

> 50.< 

0.01 

0.00 

0.00 

0.01 

0.00 

0.01 

0.00 

0.01 

0.04 

25-50K 

0.05 

0.00 

0.01 

0.03 

0.02 

0.03 

0.02 

0.03 

0.18 

10-25K 

0.12 

0.03 

0.01 

0.04 

0.04 

0.07 

0.06 

0.06 

0.44 

lOK 

0.13 

0.02 

0.01 

0.03 

0.01 

0.02 

0.01 

0.04 

0.26 

< lOK 

0.05 

0.02 

0.00 

0.01 

0.00 

0.00 

0.00 

0.02 

0.09 

TOTAL 

0.35 

0.07 

0.03 

0.11 

0.06 

0.12 

0,09 

. Qx16 

1^00 

The  resolution  of  the  combination  between  size  of  policy  and  type  of  policy  Is  not  as  distinct  as 
was  the  case  with  example  1.  The  modal  value  of  the  distribution  is  only  13X,  that  If  the  potential 
buyer  purchases  a policy.  It  will  be  a whole  life-continuous  pay  policy  equal  to  $10,000. 
Additionally,  the  predicted  distribution  shows  that  there  Is  a 25X  probability  of  a buyer  In  this 
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f 

i 


1 


f- 


0 category.  If  he  were  to  buy,  purchasing  a whole  life-continuous  pay  type  policy  in  the  range  of 

$10,000.  -25,000.  . In  fact,  there  Is  close  to  a 703!  probability  that.  If  a prospect  purchases 

a policy.  It  will  be  over  $10,000. 

Example  3 Buyer  - This  buyer  Is  a male  professional  who  Is  married  and  between  the  ages  of  thirty 
and  thirty  nine  and  who  has  an  annual  Income  of  between  $10,000.  and  $24,999. 


ESTmiED  PBOBABILITIES 


WLCP 

WLLP 

MODL 

OIDR 

LEVT 

DECT 

COMB 

OTHR 

TOTAL 

> liOK 

0.01 

0.00 

0.00 

0.01 

0.00 

0.00 

0.00 

0.01 

0.03 

25-$OK 

O.Ob 

0.00 

0.00 

0.02 

0.02 

0.02 

0.02 

0.02 

0.15 

10-25K 

O.IO 

0.03 

0.01 

0.03 

O.OU 

0.06 

0.07 

o.ou 

0.38 

lOK 

0.  12 

0.03 

0.01 

0.03 

0.01 

0.02 

0.01 

0.03 

0.21* 

< lOK 

0.11 

o.ou 

0.01 

0.02 

0.00 

0.00 

0.00 

0.03 

0.21 

TOTAL 

0.38 

0.10 

0.03 

0.10 

0.07 

0.10 

0.11 

0.12 

1.00 

i 


The  conditional  probability  distribution  of  Example  3 Is  similar  to  that  shown  by  example  2,  however, 
there  Is  a downward  shift  In  policy  size.  This  is  no  doubt  due  to  the  difference  In  annual  Income 
between  the  Example  2 buyer  and  the  Example  3 buyer.  The  modal  value  of  Example  I's  distribution 
Is  12Z,  that  If  the  person  buys,  he  will  purchase  a whole  life-continuous  pay  policy  with  a value 
of  $10,000.  . 


An  Illustration  will  now  be  given  of  how  the  above  probability  tables  were  obtained.  Several 
dimensions  were  used  above,  however,  only  one  will  be  used  In  the  Illustration: 

Since  we  know  that  • Ry  Wj  gives  the  maximum  correlation  between  two  variates,  we  can 

actually  draw  a regression  lli^e  whose  equation  Is  W,  ■»  Ry  ^ V,  where  the  slope  of  the  line  Is 


a function  of  R,,  u 
' 1 "1 


From  this  line  we  may  obtain  the  distribution 
of  buyers  (dj)  for  a given  product  Pj  (40 
products  may  be  obtained  out  of  the  table) 
Likewise,  we  may  obtain  62  for  product  P2- 
Then  we  obtain  the  probabilities  of  P j and 
P2  for  the  buyer  B on  the  W.axls,  as  shown, 
using  Bayes'  Theorem. 


The  above  examples  Indicate  the  Importance  of  canonical  correlation  methods  as  applied  to  the 
buyer-product  relationship. 

Another  Important  application  of  canonical  correlation  method  deals  with  the  manner  In  which 
companies  relate  to  one  another  based  on  what  they  sell  and  to  whom  they  sell.  Given  an  Insurance 
company's  characteristics  such  as  company  size,  whether  It  Is  stock  or  mutual,  ordinary  or  combina- 
tion, we  can  determine  the  company's  peers...  insofar  as  how  the  company  characteristics  relate 
to  buyers  and  products  sold. 

The  same  sample  was  used  in  this  analysis  as  was  used  to  determine  the  buyer-product  probabili- 
ties. Some  of  the  designated  variables  used  In  the  analysis  are  as  follows: 


"Left  Side"  Variables 
Buyer  and  Product 


"Right  Side"  Variables 
Company  Characteristics 


‘1 
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Sex  of  buyer 


Type  of  company 


X 

2 

Age  of  buyer 

"2 

- 

Age  of  company 

X 

3 

Marital  status 

- 

Volume  of  business 

X 

4 

Occupation 

- 

Area  of  operation 

X 

5 

Income 

- 

Compensation  plan 

X 

6 

State  of  residence 

Commissions  paid 

X 

7 

Type  of  policy 

^7 

Advertising  expenses 

X 

8 

Size  of  policy 

^8 

Reserves 

X 

9 

- 

Mode  of  payment 

"9 

Claims 

X 

10 

- 

Annualized  premium 

^ 10 

Participating  or  nonparticipating 

11 

Assets 

^ 12 

Licensed  in  New  York 

^13 

Lapse  rate 

The  following  represents  a graph  of  the  first  two  right  hand  canonical  variates  Wj  and  W^.  It 
shows  the  plotted  locations  of  94  insurance  companies  that  contributed  to  the  1970  LIMRA  Buyer  Stud’ 


A plot,  in  the  space,  of  each  of  the  94  contributors  to  LIMRA's  1970  Buyer  Study.  The  location 

of  each  company  point  is  a function  of  the  company's  characteristics.  Each  characteristic  is 
weighted  according  to  how  it  relates  to  who  buys  company  products  and  what  is  bought. 


The  dimension  indicates  that  the  buyers  and  products  sold  are  related  to  the  size  of  the 
company  and  whether  it  is  a stock  or  mutual  company.  In  W2,  which  is  the  second  most  significant 
dimension,  an  ordinary  company  versus  a combination  company  relationship  with  buyers  and  products 
sold  is  indicated.  From  examining  the  results  of  this  analysis  it  would  seem  that  more  homo- 
geneous comparisons  of  buyer  and  product  can  be  made  by  grouping  companies  on  a size  and  type 
basis.  To  determine  the  peers  for  a particular  company,  it  would  be  appropriate  to  employ  all 
canonical  variates  in  order  to  calculate  Mahalanobls’  distance  (the  weighted  distance  between 
two  points). 

Discussion 

Canonical  correlation  analysis  may  be  regarded  as  a type  of  principal  components  analysis. 
Thus,  the  rules  by  which  canonical  variates  are  Interpreted  are  the  same  as  those  used  to  inter- 
pret discriminant  functions  and  principal  components.  The  relative  magnitudes  and  signs  of 
several  combining  weights,  which  define  each  of  the  canonical  variates,  are  closely  examined  to 
see  if  a meaningful  interpretation  can  be  given.  This  type  of  analysis  can  have  widespread 
applications.  For  example,  canonical  correlation  analysis  could  prove  highly  useful  In  studying 
the  relationships  between  personality  attributes  and  favorable  career  fields.  It  would  be  a 
means  by  which  to  indicate  "the  right  Job  for  the  right  person."  An  application  such  as  this 
could  Increase  Job  satisfaction  among  workers  and  save  a company  money  by  decreasing  employee 
turnover  due  to  misplaced  individuals. 

The  above  example  relates  one  area  in  which  probability  estimates  could  be  formed  by  using 
canonical  correlation  methods.  The  probabilistic  structure  of  canonical  correlation  Indicates 
that  the  use  of  this  type  of  statistical  procedure  can  and  should  be  further  extended  into  the 
areas  of  education,  psychology,  science  and  Industry.  It  would  seem  to  have  numerous  passible 
applications  in  meteorology  as  well. 
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Chapter  7 


MARKOV  PROCESSES 
BY 

CAPTAIN  ROGER  WHITON 


1 . Introduction. 

Many  mathematical  Idealizations  or  models  of  nature,  as  well  as  a fair  number  of  actual  physical 
processes  themselves,  have  the  property  that  the  outcome  of  any  trial  or  event  opportunity  depends 
only  on  the  outcome  of  the  Immediately  preceding  trial.  There  Is  no  dependency  on  the  history  of 
earlier  trials.  Processes  having  this  property  are  referred  to  as  Markov  processes  or  Markov 
chains,  after  Andrei  Andreevich  Markov  (1856-1922),  whose  1906-1907  studies  of  the  Brownian 
motion  of  gas  molecules  In  a closed  container  laid  the  groundwork  for  this  subject  and  later  led 
to  the  Investigation  of  dependence  and  stochastic  processes.  The  first  correct  mathematical  con- 
struction of  a Markov  process  with  continuous  trajectories  Is  attributed  to  Norbert  Wiener  In  1923. 
The  general  theory  of  Markov  processes  was  developed  In  the  1930s  and  1940s  by  A.N.  Kolmogorov, 

W.  Feller,  W.  Doeblin,  P.  Levy,  J.  L.  Doob, and  others.  Formulation  of  the  Markov  chain  ante- 
dated the  principal  mathematical  development  of  stochastic  processes.  In  retrospect  It  can  be  seen 
that  the  Markov  chain  Is  actually  an  extremely  simple,  nevertheless  elegant  member  of  the  class  of 
stochastic  processes. 

It  Is  not  difficult  to  Imagine  circumstances  in  which  a Markov  chain  can  apply.  A classical 
albeit  somewhat  artificial  example  is  the  random  walk  problem  In  which  a person  or  object  moves  in 

single  steps  from  one  position  to  another  along  a line,  with  one  move  permitted  in  each  trial. 

For  each  current  position  1,  there  exist  conditional  probabilities  for  a move  from  1 to  1 -F  1, 

from  1 to  1 - 1,  or  from  1 to  1 (a  "stationary  move").  The  future  move  is  made  In  accordance  with 

those  conditional  probabilities,  but  the  determination  of  which  probabilities  apply  is  a function 
strictly  of  the  earlier  trial,  which  determined  the  move  to  1.  Under  this  conceptualization,  the 

history  of  moves  preceding  the  one  that  put  the  body  at  1 have  exactly  no  bearing  on  the  move  now 

at  hand.  Other  examples.  Involving  branching  problems,  multinomial  trials,  success  run  chains, 
certain  urn  models,  and  mathematical  diffusion  models  have  been  developed.  With  varying  degrees 
of  success,  the  Markov  concept  has  been  applied  to  real  phenomena  such  as  paths  of  free  electrons 

In  crystals,  queuing  problems  and  even  brand  selection  preferences.  Many  natural  phenomena, 

however,  exhibit  complex  dependencies  and  periodicities  that  tend  to  violate  the  simple  Markov 
requirement.  Whether  the  Markov  chain  Is  a usefully  valid  model  of  these  real  phenomena  depends 
on  the  extent  to  which  real  world  complexities  violate  the  Markov  concept  and  degrade  its  pre- 
dictions. In  general,  the  suitability  of  the  Markov  chain  as  a model  of  real  phenomena  Is  de- 
termined by  empirical  test. 

The  weather  has  been  considered  to  change  from  one  state  or  condition  to  another  according 
to  the  Markov  process,  with  the  outcome  of  trial  (or  weather  observation)  n depending  exclusively 
on  the  outcome  of  the  trial  or  observation  n - 1 immediately  preceding. 


Before  undertaking  a discussion  of  Markov  chains,  let  us  review  some  simple  concepts  from 
matrix  algebra  and  probability  theory. 

2.  Matrix  Multiplication. 

If  7 Is  an  m X p matrix  ^ • number  of  rows  1 and  p ■ number  of  columns  j),  and  If  f Is 
a p X n matrix,  the  product  G ■ If  f of  the  two  matrices  exists.  Such  a product  must  be  a 
m X n matrix.  The  rule  for  formation  of  the  matrix  product  Is 

'll  ■ (^,'u'’kl 

If  f Is  an  n-square  matrix,  then  we  can  form  all  the  powers  of  7,  namely: 


V 


A A 


-ay 

A 


Routines  for  matrix  multiplication  are  normally  resident  In  the  mathematics  libraries  of  computer 
Installations  and  are  useful  in  the  case  of  matrices  larger  than  3 x 3 or  for  chains  of  matrix 
multiplications.  Smaller  problems  are  easily  handled  by  manual  methods. 

3.  Matrix  Inversion. 

If  T la  a non-alngular,  m x m,  square  matrix.  It  has  an  Inverse  A ^ such  that 
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# 

I' 


t A-l  - ? 

where  T is  the  Identity  matrix  (consisting  of  all  ones  along  the  main  diagonal). 

The  process  of  Inverting  a matrix  is  Inherently  more  complex  than  that  of  matrix  multiplication. 
Computer  library  matrix  Inversion  routines  are  correspondingly  more  valuable  and  more  often  used 
than  routines  for  matrix  multiplication.  While  a variety  of  matrix  inversion  methods  la  available, 
a simple  technique  usable  for  hand  inversion  of  smaller  matrices  (up  to  3 x 3 conveniently)  is 
illustrated  below.  This  is  not  usually  the  method  selected  for  computer  implementation. 

In  the  inversion  of  an  m x m matrix  iT by  the  method  of  cofactors,  one  first  forma  the 
cofactor  matrix  of  T.  Each  element  aijC  of  the  cofactor  matrix  is  the  determinant  of  the  minor 
formed  by  deleting  from  the  original  matrix  the  row  1 and  column  j.  For  example, 

^11'=  « 322^33  - 323332 
ai2C  « ®21^33  - 323331 


With  the  cofactor  matrix  available,  the  next  step  is  to  form  the  so  called  adjoint  matrix^  > 
which  is  simply  the  transpose  of  the  cofactor  matrix  A*-: 

T - 

Transposing  a matrix  is  simple.  One  writes  as  the  columns  of  the  transpose  those  elements  comprising 
the  rows  of  the  original.  Thus, 


r 


311  = 

a2ic 

®31= 

T3  - 

(AC)' 

- 

312  = 

322= 

332= 

313= 

323= 

333= 

With  the  adjoint  matrix  available,  one  forms  the  inverse  of  7 simply  by  dividing  the  adjoint  matrix 
^3  by  the  determinant  I of  the  original  matrix,  l.e., 

T-1  - r-  / |a| 

Since  |a|  is  a scalar,  one  simply  divides  each  element  3^ja  of  the  adjoint  by  that  number  to 
obtain  the  inverse This  manual  technique  has  been  used  to  invert  the  matrices  in  this  chapter 
because  they  are  all  3 x 3 or  smaller.  Technically,  the  method  is  referred  to  as  Cramer's  rule. 

4.  Fixed  Vector  or  Fixed  Point. 


Consider  the  vector  (row  matrl)J7  having  n components,  where 

T-  [qi  <)2  ^3  ...  qj  ...  qn-i  qn] 

Such  a vector  is  (1  x n)  -dimensional.  Provided  the  matrix  T is  square  and  (n  x n),  we  can  define 
a (1  X n)  matrix  product  q A.  If,  furthermore, 

•* 

q A • q 

then  we  say  that  q is  "left  fixed"  (not  changed)  by  Its  multiplication  with  A.  Any  vector  q i*  0 is 
a fixed  vector  (or  so  called  fixed  point)  of?  if  it  is  left  fixed  when  multiplied  by  Tf.  To  lllus- 
trate,  let  us  choose 


If  q A • q,  then 


[qi 

q2  ] 

(n  . 2) 

A « 

r**!! 

^12! 

(m  ■ n « 2 ) 

[®21 

^22] 

[(qiaii  ♦ 

) (qiai2 

♦ q2a22^) 

qi  ail 

+ q2 

^21  “ 

qi 

qi  ai2 

♦ qj 

®22  “ 

q2 

q2  - 

qi  T" 

a.l2j 

ail 

- ^22  - 

*12 

*21 
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If  we  are  given,  for  example,  the  matrix. 


[: 


i' 

3 


then  we  can  develop  any  number  of  fixed  vectors  ^ of  the  matrix*^  simply  by  choosing  arbitrary  q^. 
For  example,  if  we  select  • -1  and  q2  * ^^2,  giving 

■q  - [-1  1/2] 

If,  on  the  other  hand,  we  select  qj  - 2,  then 

? - [2  -1  ] 

These  are  two  of  Infinitely  many  fixed  vectors  of  A 


5.  Probability  Vectors  and  Stochastic  Matrices. 


The  probabilities  associated  with  various  states  of  a system  may  be  expressed  in  terms  of  a 
probability  vector  'p  having  one  element  for  each  such  state.  For  example,  if  the  weather  is 
characterized  in  terms  of  three  mutually  exclusive  and  exhaustive  states,  "stormy,"  "unsettled" 
and  "fair"  (states  1,  2 and  3,  respectively),  then  the  likelihood  of  occurrence  of  each  state  is 
given  by  a probability  vector  p: 

7 “ Tpi  P2  P33 


We  say  that  a vector, 

"P  - [Pl  P2  P3  •••  Pj  Pn-1  Pn] 

is  a probability  vector  if  its  components  are  non-negative  and  their  sum  is  unity. 

Extending  this  idea  of  probability  as  a vector,  where  the  vector  is  a one-dimensional  matrix, 
we  can  express  the  likelihood  of  transition  from  one  state  to  another  as  a stochastic  matrix. 

For  example,  in  the  three-state  problem  given  above,  if  the  present  weather  Is  "unsettled"  (state 
2),  then  it  can  either  change  to  "fair"  (a  2,3  transition),  change  to  "stormy"  (a  2,  1 transition), 
or  remain  the  same  (a  2,2  "transition"*).  Thus,  associated  with  present  state  2 there  are  three 
transition  probabilities:  P21>  P22  and  P23-  Likewise,  there  are  three  other  transition  probabili- 
ties associated  with  present  state  1 and  three  more  with  present  state  3.  Plainly,  the  likelihood 
of  "change"*  in  the  weather  can  in  our  three-state  problem  be  characterized  in  terms  of  a 3 x 3 - 9- 
component  matrix: 


Pll  Pi2  Pi3 

P ■ P21  P22  P23 

P31  P32  P33 

As  it  turns  out,  such  a matrix  is  a stochastic  matrix.  In  fact,  it  is  a particular  form  of  stochas- 
tic matrix  called  a "transition  matrix"  which  will  be  described  In  a later  section.  The  proba- 
bilities pij  are  conditional  pr^abllltles  that  state  J will  occur  given  the  system  is  in  state  1. 

In  equivalent  notation. 


Plj  - P { I «l) 

A square  matrix  is  called  a stochastic  matrix  if  each  of  its  rows  is  a prob^illty 

vgctor.  If  two  matrices  Pi  *and  1^2  stochastic,  their  product  nn**  i*’®  powers  And 

P2‘'  Ate  also  stochastic  matrices.  A stochastic  matrix  P is  said  to  be  regular  if  all  the  elements 
of  any  of  its  powers  P"  are  positive.  Zeroes  or  negative  numbers  arc  disqualifying. 

Regular  stochastic  matrices  have  mathematically  attrictive  properties.  If  ? is  a regular 
stochastic  matrix,  then  it  follows  that: 

- Associated  with  P is  a unique  fixed  probability  vector  ? each  of  whose  components  is 
positive  and  for  which,  by  definition. 


•We  are  using  "change"  and  "transition"  in  their  extended  sense,  which  includes  the  act  of 
remaining  the  same,  or  "persisting." 
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- The  sequence  of  powers  of  namely,  r.  V.  V approaches  a matrix  each  of  whose 

rows  Is  simply  the  fixed  probability  vector  t. 

- If  ? is  any  probability  vector,  then  the  sequence  of  vectors  a?,  approaches 

the  unique  fixed  probability  vector  t. 

Notice  we  said  the  fixed  vector*?  is  "unique,"  whereas  fixed  vectors  In  general  are  far  from 
unique.  What  makes?  unique  is  our  insistence  that  It  be  more  than  an  ordinary  fixed  vector.  We 
required  that  it  also  be  a probability  vector,  the  sum  of  whose  elements  Is  unity.  This  additional 
constraint  is  sufficient  to  reduce  the  infinity  of  candidate  fixed  vectors  of  P to  a single  fixed 
probability  vector  t,  in  other  words,  a unique  fixed  vector. 

To  see  how  this  works,  let  us  consider  the  transition  matrix  T characterizing  the  three-state 
weather  problem  discussed  earlier: 


0.4 

0.5 

0.1 

0.1 

0.5 

0.4 

0.01 

0.09 

0.9 

We  seek  a fixed  probability  vector, 

T . 

such  that 


Performing  the  Indicated  matrix  multiplication  "t?  and  setting  the  product  equal  to  the  vector  T 
yields  the  system, 

P31  + (bll-P3l-l)tj  + (P2l'P3l  )f2  " ° 

P32  (P12"P32  ^^1  (P22"P32'3)  ^2  “ ® 

P33  ■ 1 (pi3"P33T3)  f 1 (P23*P33'*’l)  f 2 " ® 

We  can  simplify  the  notation  by  using 


a - 031 

•>  • Pll"P31*l 

c - P21'P31 


which  yields 


d • P32 
e - 012-P32 
f “ P22"P32"^ 


a + btj  + ct2  " 0 

d + et|  ftj  • 0 
g + htj  + lt2  “ 0 


g • P33  ■ * 

h • P13*P33'*'3 
1 ■ P23'P33'*'3 


where  obviously  the  final  equation  Is  superfluous.  The  problem  reduces  to  one  of  solving  two 
simultaneous  equations.  From  the  first, 


Using  the  second. 


a bti 
c 


fa  - cd 
ce  - fb 


Substituting  the  numerical  values  provided  In  the  transition  matrix,  we  obtain 
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fron  these  values,  we  obtain 


ti  - 0.0433 
Therefore, 


and 


t2  - 0.1827 


^3  - 1 - ti  - t2  - 0.7740 


Hence, 

T - (0.0433  0.1827  0.7740] 

That  T Is  indeed  a fixed  vector  of  ^ c.in  be  verified  by  performing  the  matrix  multiplication 
and  noting  that  the  product  is  equal  to  t within  the  limits  of  truncation  error. 


When  the  dimensionality,  i.e.,  the  number  of  Markov  states,  of  the  problem  is  large,  the 
straightforward  algebraic  method  shown  above  for  obtaining  the  fixed  probability  vector  t becomes 
unwieldy.  Under  these  circumstances,  matrix  solutions  may  prove  advantageous,  since  computational 
routines  for  them  are  available  In  many  computer  libraries.  Let  us  reconsider  our  ^et  of  simultan- 
eous equations: 


btj^  + ct2  = -a 
etj^  ♦ ft^  = -d 


or 


Hencf? , 


Taking  the  inverse, 


c I rtj^l  r~®1 

^ J 

[‘4l  ■ [:  M 

[f/(bf-ce)  -c/(bf-ce 

-e/(bf-ce)  bCbf-ce) 

[( cd-f  a)  / (bf-ce  )| 
(ae-bJ)  /( bf-ce  )J 


and 

T - (0.0433  0.1827  0.7740j 


which  was  the  result  obtained  previously  by  non-matrix  methods. 

One  of  the  properties  of  the  regular  stochastic  matrix  Is  that  the  sequence  ^ Its  powers 
^proaches  the  matrix  T each  of  whose  rows  Is  simply  the  fixed  probability  vector  t associated  with 
P.  In  the  present  case. 


0.0433 

0.1827 

0.7740 

T - 

0.0433 

0.1827 

0.7740 

_ 0.0433 

0.1827 

0.7740_ 

forming  some  of  the  powers  of  the 

matrix  f 

’0.2110 

0.4590 

0.3300' 

0.0940 

0.3360 

0.5700 

_ 0.0220 

0.1310 

0.8470_ 
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0.1336 

0. 3647 

0.5017 

0.0769 

0.2663 

0.6568 

0.0304 

0.1527 

0.8169 

0.0611 

0.2225 

0.7164 

0.0507 

0.1993 

0.7500 

0.0406 

0.1765 

0.7829 

0.0441 

0.1844 

0. 7715 

0.0437 

0.1834 

0.7730 

0.0432 

0.1824 

0.7744 

Eventually, agrees  with  7 to  the  fourth  decimal  place. 

The  fact  that  the  traj^sltlon  matrix  ^converges  to  In  sufficiently  great  powers  gives  the 
fixed  probability  vector  t a special  meaning  In  the  Markov  process.  As  shown  below,  t represents 
the  long  term  likelihood  of  occurrence  of  each  of  the  Markov  states  over  many  Markov  "trials." 
Meteorological Iv,  the  t vector  embodies  the  unconditional  probabilities  pj  * p each  of  the 

Markov  states,  i.e.,  the  climatological  relative  frequency  of  the  states. 

h . The  Markov  Process  or  Markov  Chain . 

The  Markov  chain  is  a mathematical  model  of  the  behavior  of  a system.  When  used  to  represent 
actual  systems,  real  processes  or  the  behavior  of  nature,  the  Markov  chain  represents  a simplified 
generalization  of  complex,  varied  reality.  Like  any  model,  the  Markov  chain  succeeds  to  a greater  or 
a lesser  extent  In  portraying  the  actual  behavior  of  real  systems,  depending  on  the  extent  to  which 
those  systems  correspond  to  the  requirements  of  the  model. 

The  Markov  model  of  the  behavior  of  a system  has  two  such  requirements  or  defining  properties: 

- The  system  under  consideration  may  be  categorized  as  being  in  one  of  a finite  number  of 
states  at,  the  complete  set  of  which,  namely, 


const Itu t e s the  state  space  of  the  system . 

- At  each  trial,  the  system  has  the  opportunity  either  to  change  Its  state  or  to  remain  in 
the  same  state.  The  outcome  of  any  trial  in  terms  of  the  state  of  the  system  depends  at 
most  upon  the  outcome  of  the  Immediately  preceding  trial,  and  not  upon  any  previous  outcome. 

The  Markov  process  or  so  called  finite  Markov  chain  is  a stochastic  process  embodying  these 
two  model  conditions.  In  the  Markov  process,  we  envision  the  state  of  the  system  changing  from 
to  aj.  where  either  1 )<  j or  1 - J . This  Is  termed  an  (a^,  a^)-  transition,  or  in  short 
an  (1,  j)-t ranslt ion  meaning  the  state  a,  occurs  immediately  after  aj  occurs.  In  other  words,  the 
outcome  of  trial  n Is  state  aj  and  the  outcome  of  trial  n + 1 is  state  ai.  The  probability  that 
a system  in  state  aj  will  undergo  a transition  to  state  aj  is  pjj,  called  a transition  probability, 
a conditional  prob^lllty.  As  we  discussed  before,  the  transition  probabilities  pij  form  a 
^r.^nj5jtj^n  matrix  P,  an  m x ra  stochastic  matrix  where  m is  the  number  of  permitted  states  aj. 


Pll 

Pl2 

Pirn* 

P2l 

D22 

...  p,j 

*^2in 

where 

t “ Current  state 

J ■ Future  state 

} 

Pml 

’'m2 

i 

y ^ ■ 


For  each  current  state  aj,  the  ith  row  of  the  transition  matrix  is  the  conditional  probability 
vector  of  all  possible  state  outcomes  in  the  next  trial.  The  fact  that  ^ is  a regular  stochastic 
matrix  guarantees  each  of  its  rows  will  be  a probability  vector. 

Markov  models  have  been  used  to  attempt  to  represent  changes  in  the  weather  as  transitions 
from  one  Markov  state  to  another.  These  meteorological  applications  of  the  Markov  model  consider 
nature  has  a new  "trial"  at  each  of  N regularly  spaced  observation  intervals  where  ^ 

might  logically  be  the  weather  observation  interval  of  I hr.  For  each  such  trial,  there  must  be 
an  outcome  in  the  form  of  a Markov  state  aj . 

We  can  illustrate  by  categorizing  the  behavior  of  the  weather  in  terms  of  the  same  three 
Markov  states  we  considered  before: 

aj  * Stormy  32  * Unsettled  33  « Fair 

This  three-state  Markov  problem  is  an  oversimplification  used  here  for  Illustrative  purposes. 

In  actual  problems,  in  order  to  obtain  needed  resolution  in  the  forecast  scheme,  there  can  be 
hundreds  of  Markov  states  corresponding  to  realistically  resolved  discretizations  of  such  observed 
variables  as  ceiling,  visibility,  sea  level  pressure,  temperature,  dew  point  temperature,  wind, 
and  others.  Methods  for  handling  n-state  Markov  chains  will  be  discussed  in  a later  section. 
Meanwhile,  the  three-state  problem  will  suffice  to  exemplify  basic  Markov  concepts. 

The  foundation  upon  which  any  Markov  analysis  stands  is  historical  data  on  the  performance 
of  the  system  over  many  trials.  In  the  meteorological  sense,  this  historical  data  represents 
cllmatologv  and  might  be  a list  of  weather  observations  such  as  that  shown  below: 

Trial  • Time*  Outcome  = State  of  Nature 


1 


^2 


2 


3 


a 


3 


4 


5 


a 


2 


If  sufficient  data  are  available,  the  sequence  of  trials  and  outcomes  called  climatology  can 
be  converted  into  a Markov  transition  matrix  For  convenience,  let  us  suppose  our  data  have 

yielded  the  transition  matrix  discussed  In  section  5 above.  The  Interpretation  of  the  l^matrix  is 
clear.  Let  us  imagine  that  the  current  state  Is  "unsettled"  (32  • 1,  i “ 2).  Under  these  cir- 
cumstances, the  applicable  vector  of  conditional  probabilities  is  the  second  row  of  the  matrix. 


i.e., 


'P2  - t 0.1  0.5  0.4  ] 


This  probability  vector  Indicates,  given  that  the  weather  Is  now  "unsettled,"  there  Is  a 10  per- 
cent likelihood  conditions  will  change  to  "stormy,"  a 40  percent  chance  they  will  improve  to 
"fair,"  and  a 50  percent  chance  the  weather  will  remain  the  same.  These  probabilities  can  be 
taken  aa  a probabilistic  forecast  of  the  state  of  the  system  for  one  time  step  In  the  future,  l.e., 
at  tine  t^  + where  t Is  the  initial  time.  For  the  case  where  the  transition  matrix  is 

baaed  on  one-hour  weather  changes,  the  probability  vector  provides  a one-hour  probabilistic  weather 
forecast . 


But  can  we  apply  the  Markov  model  to  weather  changes?  If  we  are  to  do  so,  the  weather  must 
meet  the  condition  that  the  outcome  of  any  trial  or  weather  observation  depends  only  on  the 
outcome  of  the  preceding  trial  or  observation.  It  Is  by  no  means  certain  that  this  Is  Che  case. 
Indeed  the  notorious  dependency  and  periodicity  of  weather  data  suggest  such  a premise  is  not 
true  In  general.  Unlike  the  classical  random  walk  and  coin  toss  problems.  In  which  the  Markov 


•Units  are  arbitrary  and  problem-dependent.  In  problems  where  observations  at  1 hr  intervals 
are  used  to  construct  the  transition  matrix.  At  • 1 hr,  and  the  unit  of  time  Is  hours. 
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requirement  is  met,  or  the  urn  problem  without  replacement.  In  which  It  la  not,  there  are  no 
a priori  means  of  ascertaining  whether  the  weather  meets  the  requirements  of  a Markov  process. 

On  the  other  hand,  the  Markov  chain  may  be  sufficiently  effective  as  a model  of  atmospheric 
behavior  to  have  skill  In  weather  prediction.  The  best  test  is  to  state  the  process  of  weather 
change  In  Markov  terms  and  then  to  consider  empirically  the  performance  of  the  Markov  model  In 
simulating  weather  events.  If,  for  example,  a long  series  of  Markov  predictions  neither  violates 
climatology  nor  lacks  skill  In  prognosis  when  applied  to  independent  data,  then  the  Markov  model 
can  be  considered  applicable  to  weather  events. 

If  our  discussion  were  to  end  here,  we  might  be  justified  In  concluding  the  Markov  concept 
has  little  practical  value,  being  limited  merely  to  forecasts  for  the  next  trial,  l.e.,  the  next 
observation  time.  In  practice,  the  meteorologist  prepares  forecasts  for  several  hours  in  advance, 
not  just  one.  Fortunately,  the  Markov  process  readily  extends  to  higher  order  transition  proba- 
bilities by  means  of  an  n-step  Markov  chain.  This  concept  Is  discussed  In  the  following  section. 

7 . The  n-Step  Markov  Chain  and  Higher  Order  Transition  Probabilities. 

Assuming  the  weather  changes  according  to  a Markov  process  In  which  "trials"  occur  at  Intervals 
of  the  normal  hourly  observation  time,  let  us  consider  the  problem  of  making  a three-hour  forecast 
of  "stormy,"  "unsettled"  or  "fair"  In  our  three-state  system,  based  on  an  initial  state  of  "un- 
settled" (1-2). 

If  our  problem  had  been  to  make  a one-hour  forecast,  the  problem  would  have  been  simple. 

The  forecast  result  Is  simply  the  probability  vector  'p2  discussed  above.  This  is  the  trivial  one- 
step  Markov  process.  Instead  of  this,  our  actual  problem  Is  the  n-step  Markov  process.  In  which 
n is  three  In  this  case. 

Specifically,  we  desire  the  probability  p;^j  that  the  system  changes  from  state  a^  to  aj 
in  exactly  n steps.  In  general,  the  n states  of  nature  can  be  given  as 

aj^®^  aj^d)  — » aij(2)  _•  aj  (n) 

or  In  our  particular  three-step  case, 

Si.2^^'—  — 0^(2)  — aj(3) 

The  probabilities  of  the  n-step  transition  from  a^  to  ai  form  what  Is  known  as  an  n-step 

transition  matrix  . The  n-step  transition  matrix  turns  out  to  equal  the  nth  power  of  the 
original  stochastic  transition  matrix  1^.  Thus, 

p(")  . p" 

In  our  three-state  problem,  we  earlier  calculated  the  third  power  of  7 as 
’0.1336  0.3647  0.5017 

- 0.0769  0.2663  0.6568 

,0.C304  0.1527  0.8169 

For  the  case  of  1 ■ 2 lnltially»  the  conditional  probabilities  governing  the  three-hour  forecast 
vould  be 

- [0.0769  0.2663  0.6568] 

where  the  most  likely  event  (with  a 66  percent  chance)  is  improvement  In  the  weather.  Note  how 
this  contrasts  with  the  one-hour  forecast,  where  persistence  of  "unsettled"  weather  was  roost 
likely; 

- [0.1  0.5  0.4  J 

Since  'T  Is  a regular  stochastic  matrix.  Its  powers  approach  the  matrix  T each  of  whose  rows  is 
simply  the  fixed  probability  vector  t.  We  illustrated  this  before.  For  example,  the  forecast 
for  six  hours  (n  • 6)  is 

f'jCb)  . [0.0507  0.1993  0.7500] 

and  for  12  hours  (n  - 12)  Is 

[0.0437  0.1834  0.7730] 
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At  24  hours  we  have 


► 


- [0.0433 

This  convergence  of  to  T leads  to 

Markov  chains. 


0.1827 
the  subject 


0.7740]  . T 

of  the  stationary  distribution  of  regular 


8.  Stationary  Distribution  of  tbe  Markov  Chain. 

If  a Markov  problem  is  characterized  by  a transition  matrix?  that  Is  regular,  then  the 
sequence  of  n-step  transition  matrices?^  approaches  the  matrix?  each  of  whose  rows  is  the  fixed 
probability  vector  t.  Hence,  for  sufficiently  many  Markov  steps  (i.e.,  for  long  range  forecasts), 
the  conditional  probability  that  state  a.  occurs  n steps  after  aj^  becomes  independent  of 

the  original  state  aj  and  approaches  the  component  tj  of  ?,  the  fixed  probability  vector  of  ?. 

This  leads  to  the  concept  that  t represents  the  stationary  distribution  of  the  Markov  chain, 
i.e.,  that  distribution  of  transition  probabilities  that  obtains  after  a large  number  of  steps  of 
the  Markov  process.  The  fixed  probability  vector  t,  then,  represents  the  long  term  likelihood 
that  state  space  a will  obtain.  The  likelihood  of  state  a.  is  given  the  component  tj  in  the 
long  run.  In  the  meteorological  sense,  the  fixed  probability  vector  t represents  climatology, 
the  expected  long  term  relative  frequency  of  occurrence  of  states  of  the  weather. 

The  fact^^that  the  transition  matrix  of  forecast  conditional  probabilities?^  approaches 
climatology  T as  the  forecast  period  n At  lengthens  is  highly  attractive  from  the  point  of 
view  of  a practical  prediction  scheme.  Almost  any  experienced  forecaster,  when  asked  for  a 
forecast  beyond  the  period  for  which  conventional  prognostic  techniques  show  skill,  will  turn  to  the 
climatology  to  construct  his  "prog," 

The  convergence  of  ?^  to  the  unchanging  matrix  of  climatology  ? provides  us  with  a useful 
intuitive  perception  of  how  the  Markov  process  works.  We  see  that,  as  time  goes  on,  the  effect  of 
the  initial  state  on  the  likelihood  of  occurrence  of  future  states  appears  to  "wear  off,"  This 
is  a physically  reasonable  result.  We  would  hardly  expect  this  hour’s  observation  of  the  visibility 
at  Spokane  to  have  much  bearing  on  the  probability  distribution  of  visibility  states  there  a 
year  from  now.  On  the  other  hand,  the  likelihood  of  visibility  less  than  1 mile  there  an  hour 
from  now  would  be  enhanced  greatly  if  the*  current  observation  were  to  show  visibility  restrictions. 

Thus  we  can  think  of  the  Markov  chain  as  modeling  the  decorrelation  of  weather  events  over 
time . 


9.  State  Probability  Distributions  vs.  Transition  Probabilities. 

So  far  we  have  considered  only  the  conditional  probabilities  expressed  In  the  Markov  transition 
matrix.  Such  a probability  might,  for  example,  be  P|2*  conditional  probability,  given  the 
system  is  in  state  1,  that  it  will  change  to  state  2.  For  forecast  purposes,  however,  it  is  often 
more  useful  to  have  the  forecast  probabilities  of  occurrence  of  each  of  the  Markov  states  aj  for  the 
forecast  time  t ■ t©  -f  nAf*  We  consider  in  this  section  how  these  state  probabilities  can  be 
obtained. 


If  pj2  Is  the  conditional  probability  of  state  2 at  time  t given  state  1 at  time  t^,  and  if  qj 
is  the  probability  of  state  1 at  time  Iq,  then  the  probability  of  state  2 at  t is  simply  the  product 
of  ’The  probability  is  called  an  initial  probability.  There  is  actually  a distribution 

of  initial  probabilities,  one  for  each  Markov  state,  forming  an  initial  probability  vector: 


? - [^1  q2  ^3] 

According  to  the  reasoning  we  used  above,  the  state  probability  distribution  of  the  system  at  time 
t is  simply  the  product, 

+ ‘>3P31^"^ 

q ■ ■?1P12^"^  + q2P72^"^  + ‘>3P32‘"^ 

■>lPl3^"^  + '’2P22^"^  <13P33^"\ 

li'  some  applications,  particularly  climatological  ones,  the  initial  probability  distribution  Is 
fractional  and  expresses  the  likelihood  qj  that  the  system  will  start  In  Markov  state  aj.  In 
weather  forecasting  applications  of  the  Markov  model,  however,  there  Is  usually  no  uncertainty  about 


7-9 


the  initial  state,  since  that  state  Is  almost  always  the  observation  (at  time  to)  upon  which  the 
forecast  for  time  t is  based.  Accordingly,  the  initial  probability  distribution  is  generally  a 
vector  consisting  of  all  zeroes  and  a single  one.  If,  for  example,  the  initial  state  Is  2 
("unsettled"),  then 


r = fo  1 o] 

and  the  forecast  state  probabilities  for  time  t reduce  to 


[ P2 


(n) 


P22 


(n) 


023 


(n) 


This  result  can  be  verified  by  performing  the  matrix  multiplication.  Note  that  In  obtaining 
the  entire  matrix"?  roust  be  raised  to  the  power  n,  not  the  elements  separately. 

It  is  apparent  that  the  Markov  model  can  generate  a probabl 1 ist lea Hy  expressed  forecast 
of  the  state  of  the  weather  at  a future  time  t based  on  an  observation  of  the  state  at  time  tg. 
In  section  11,  we  will  address  the  complexity  of  n-state  Markov  models  needed  to  express  the 
state  of  the  weather  In  all  Its  variety. 


1 0 . Eigenvalue  Methods  for  Powers  of  Matrices . 

In  both  the  classical  Markov  analysis  and  the  equivalent  Markov  approach  to  weather  forecasting 
(see  section  11  below),  obtaining  the  forecast  for  time  t = tg  + n At  Involves  taking  the  nt^ 
power  of  either  the  Markov  transition  matrix  r or  the  equivalent  Markov  matrix  of  coefficients 
? (or  T).  For  matrices  whose  dimensionality  exceeds  three,  or  for  large  n,  taking  powers  of  the 
matrix  explicitly  is  inconvenient.  Even  some  small  computers  may  be  insufficiently  equipped  to  do 
this  job  efficiently.  And  in  the  absence  of  a computer  or  tables  of  powers  of  the  matrix  ^or 
B,  it  becomes  difficult  to  apply  Markov  techniques  to  real  weather  forecasting  problems. 

Fortunately  a speedy,  computationally  simple  means  exists  for  taking  powers  of  a matrix  such 
as  ? without  performing  the  matrix  mul t i;  1 1 cat  ion  explicitly.  This  method  is  equally  suited  ro 
human  or  computer  use  and  should  he  resorted  to  whenever  powers  of  a matrix  are  desired. 

To  obtain  the  power  of  any  square  matrix  one  applies  the  rule  given  by  Feller  (1968, 
pp  428-432): 


Where  m is  the  dimensionality  of  the  m x ro  matrix  T^.  Ai  is  the  eigenvalue  of  that  matrix, 

is  the  "left"  eigenvector  associated  with  and  is  the  "right"  eigenvector  issociated  with 

Ai-  By  "left"  eigenvectors,  wc  mean  eigenvectors  of  the  original  matrix?.  "Right"  eigenvectors 
are  eigenvectors  of  its  transpose  Since  all  the  left  eigenvectors?  are  column  vectors  (m  x 1), 

ind  all  the  right  eigenvectors  ? are  row  vectors  (1  x m) . the  products  TT  are  m x m square  matrices, 
the  same  size  as  the  original  ? matrix.  On  the  other  hand,  the  product.s  7?  are  (1  x 1),  i.e., 
scalars.  Since  tue  eigenvalues  A and  their  powers  A^  are  also  scalars,  it  is  apparent  that  the 
problem  of  taking  the  n^^  power  of  ? reduces  to  that  of  taking  the  sum  of  m different  m x m 
matrices,  each  weighted  by  • Once  the  eigenvalues  and  associated  left  and  right  eigen- 

vectors and  are  found,  the  proces.s  of  obtaining?"  by  this  rule  becomes  trivial. 

Normally,  finding  eigenvalues  and  eigenvectors  is  done  on  a computer,  using  library  subroutines. 
To  illustrate  what  the  eigenvalues  and  eigenvectors  are,  however,  we  will  solve  one  such  problem  by 
hand,  finding  the  eigenvalues  and  eigenvectors  of  the  Markov  transition  matrix?  presented  earlier. 

The  eigenvalues  (also  called  characteristic  values,  characterl.stic  roots,  latent  roots, 

proper  values  or  proper  numbers)  of  a matrix?  are  the  roots  of  the  characteristic  equation  of 
the  matrix.  The  eigenvalues  Ai  are  defined  such  that  corresponding  to  each  of  them  Is  a non-zero 
v^tor  xj  called  an  eigenvector  for  which 

P xj  • Aj 

For  ['  with  dimensions  m x m,  there  exist  m eigenvalues  Aj.  The  eigenvalues  are  scalars.  To 
each  eigenvalue  there  corresponds  an  eigenvector  ?j  consisting  of  a column  of  m elements  (right 
•igenvet tors,  however,  are  considered  row  vectors). 

From  the  equation  above. 


• equivalent  to 


I 

9 
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(P  - \ T)  Xj  > 0 

where  7 Is  the  Identity  matrix.  This  is  analogous  to  the  system  of  simultaneous,  homogeneous 
equations, 

"a  T - 0 

for  whose  non-trivial  solution  we  require  that  the  "denominator  determinant"  be  zero,  i.e.,  that 
the  matrix  of  coefficients  X be  singular  (not  have  an  inverse).  Mathematically, 

det  A * I A 1 * 0 

leads  to  the  solution  vector  1?.  In  the  case  of  our  eigenvalue  problem,  we  require 

det  (P  - ^^I)  =.  |P  -A^l|  = 0 

This  Is  called  the  characteristic  equation  of  the  matrix  P: 

1?  - = 0 

The  m eigenvalues  are  the  roots  of  the  characteristic  equation. 

Let  us  expand  the  determinant  and  evaluate  it: 


I?  - 


0.4  - 0.5 


0.1 


0.1  0.5-\  0.4 

0.01  0.09  0.9  - 


which  yields  the  cubic  equation, 

~ 1.8  + 0.923  - 0.123  = 0 


= 0 


Standard  algebraic  methods  produce  from  this  cubic  the  three  roots, 


- 1,0000 

>2  » 0.5924 

A3  “ 0.2076 

which  are  the  eigenvalues  of  Associated  with  each  eigenvalue  Aj  is  an  eigenvector  of  m 
elements,  obtainable  from 

(P  - A^I)  o 0 


by  substituting  the  appropriate  eigenvalue  Aj  and  performing  the  matrix  subtraction.  Because  Aj^ 
is  less  illustrative  than  the  other  eigenvalues,  let  us  work  with  A2.  The  result  is 


(P  - A2l)  X2  = 0 


-0.1924  0.5000  O.IOOO' 

^12 

0.1000  -0.0924  0.4000 

X22 

_ 0.0100  0.0900  0.3076_ 

.^32. 

This  represents  a homogeneous  linear  system. 


Solution  by  elimination  produces  the  relations. 


Xj2  “ -6.4909  X 32 

X22  “ -2.6971  X 32 

This  result  shows  that  to  each  eigenvalue  there  corresponds  an  Infinite  number  of  eigenvectors. 
We  can  obtain  one  of  them  by  selecting  x^^  ” Then, 


_ 

’‘12 

-6.4909 

-2.6971 

*2  “ 

’‘22 

■ 

.’‘32. 

1.0000_ 

I 
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We  can  nonmilize  this  eigenvector  in  the  usual  way  by  having  its  largest  element  be  ♦!.  This  can  be 
accomplished  by  dividing  all  the  elements  by  a value  equal  to  the  first  element.  Then, 


^2 


i.oooo' 

0.4152 

.-0,1541. 


By  a similar  process,  we  can  obtain  the  eigenvectors  corresponding  to  the  remaining  eigenvalues; 


I.oooo" 

■ I.oooo' 

= 

1.0000 

^3  • 

-0.3920 

1.0000. 

, 0.0365. 

The  eigenvectors  x.  are  called  "left"  eigenvectors.  We  can  find  the  "right"  eigenvectors  y. 
by  taking  the  transpose  ?'  of  the  matrix  ? and  solving  it,  as  above,  for  eigenvalues  and  eigenvec 
tors.  The  eigenvalues  of  the  transpose  will  be  the  same  as  those  of  the  original  matrix,  but  the 
eigenvectors  will  be  different.  Indeed,  we  find 


Vi  “ 

[ 0.0560 

0.2360 

1.0000] 

¥2  = 

[-0.3078 

-0.6922 

1.0000] 

73  = 

[-0.493  5 

1.0000 

-0,5064] 

To  obtain  the 

powers  of  we 

need  the  matrix  products 

x.v.  and  y x.  . Simple  matrix 

cation  produces 

■ 0.0560 

0.2360 

1.0000' 

11  i 1 

*1?! 

s 

0.0560 

0.2360 

1.0000 

- 1.2920 

, 0.0560 

0.2360 

1.0000 

-0.3078 

-0.6922 

1.0000' 

a 

-0.1279 

-0.2876 

0.4155 

^2*2  =~0*'^‘195 

. 0.0474 

0.1066 

-0.1541 

-0.4935 

1.0000 

-0.5065' 

*3/3 

a 

0.1935 

-0.3920 

0.1985 

=-0.9041 

,-0.0180 

0.0365 

-0.0185. 

We  can  obtain 

the 

12th  power 

of  the  matrix 

? by  computing  the  following: 

■j(?12  1^^ 

0.0560 

0.0560 

0.0560 

0.2360 

0.2360 

0.2360 

1.0000 

1.0000 

1.0000, 

1.2920 

0.5924^^ 

-0.3078 

-0,1279 

. 0.0474 

-0.6922 

-0.2876 

0.1066 

1.0000 

0.4155 

-0.1541, 

0.7495 

0.2076^^ 

-0.4935 

0.1935 

-0.0180 

1.0000 

-0.3920 

0.0365 

-0.5065' 

0.1985 

-0.0185 

0.9641 

with  the  result  that 

. 

0.0441 

0.1844 

0.7715' 

0.0437 

0.1834 

0.7730 

, 0.0432 

0.1824 

0.7744 

Note  that  this  result  Is  exactly  the  same  as  calculated  earlier  In  this  chapter  by  explicit 
matrix  manipulation.  To  implement  this  eigenvalue  matrix  multiplication  method  requires  storing 
either  in  core  or  on  peripheral  storage  several  m x m matrices  “5)^  As  shown  in  section  13  of  this 
chapter,  it  is  almost  never  necessary  to  retain  all  m of  the  matrices,  since  the  higher  order  forms 
contribute  insignificantly  to  or 

1 1 . An  Equivalent  Markov  Model  Based  on  KEEP . 

We  have  seen  that  it  is  in  principle  possible  to  develop  a prediction  scheme  in  which  weather 
changes  are  modeled  as  a discrete  time  Markov  process  and  which  is  based  on  a classical  Markov 
transition  matrix  P whose  powers  are  used  to  prepare  forecasts  for  n time  steps  in  the  future. 
The  skill  of  such  a model  will  depend  on  the  extent  to  which  the  weather  behaves  as  required  in 
the  Markov  process. 


There  is  an  important  practical  limitation  on  the  application  of  a classical  Markov  transition 
matrix  to  real  problems  of  prediction.  The  limitation  Is  that  the  size  of  the  transition  matrix 
Itself  grows  exponentially  with  the  number  of  predictors/predictands  and  the  resolution  of  each 
predictor/predictand.  Consider,  for  example,  a weather  forecasting  scheme  involving  the  cloud 
coiling  in  five  classes  zj  through  zj,  visibility  in  five  classes  z^  through  wind  in  nine 

classes  zjj  through  zjq,  and  altimeter  setting  in  ten  classes  Z20  through  229.  Under  these 
circumstances,  the  present  or  future  weather  is  considered  to  be  described  by  a 29-element  binary 
vector  (zeroes  and  ones).  This  is  a conservative  scheme;  actual  prediction  models  would  typically 
encompass  an  order  of  magnitude  greater  number  of  binary  or  so  called  "dummy"  variables.  But  let 
us  continue  with  this  example  for  illustrative  purposes. 


To  apply  the  classical  Markov  approach,  we  roust  determine  how  many  Markov  states  there  are  in 
the  problem.  Each  combination  among  the  29  elements  of  the  observation  vector  constitutes  one 
such  state,  for  example: 

State 

Number  £2  £.3  *4  1,^  •••  £.p  •••  ~P-l  5.P 


1 

2 

3 


I 0 0 0 0 
0 I 0 0 0 
0 0 10  0 


0 0 0 
0 0 0 
0 0 0 


Each  of  the  variables  Zp  has  two  possible  outcomes,  zero  or  one  ("off"  or  "on").  The  dummy 
variables  in  any  one  variable  category  (such  as  z^  through  z^q  for  the  visibility  category)  are 
mutually  exclusive  and  exhaustive.  Thus,  within  any  category,  one  and  only  one  z must  be  "on"  at 
a time.  Under  these  circumstances,  considering  the  first  variable  category  alone,  there  are  five 
Markov  states.  For  each  of  these,  the  second  category  produces  five  further  states,  for  a total 
of  5 • 5 • 25  Markov  states.  For  each  of  these  25  states,  the  third  category  produces  nine, 

giving  a total  of  25  9 « 225  states.  The  final  category  of  ten  classes  brings  the  total  to 

225  ' 10  ■ 2,250  Markov  states!  If  nc  is  the  number  of  binary  variables  Zp  In  the  cth  category, 
and  if  there  are  C categories,  then  the  number  of  Markov  states  M is  given  by 

C 

M - TTnc 

C”1 


For  the  easily  lioaftlned  case  of  ten  categories  each  containing  ten  dummy  variable  classes,  the 
number  of  Markov  states  Is  1 x 10^®  or  10  blllloni  Clearly  the  pursuit  of  a Markov  transition 
matrix  of  size  (10  billion  10  billion)  is  a hopeless  exercise.  An  alternative  Is  needed  for 
all  except  the  simplest  forecasting  problems. 

A prediction  method  that  yields  probabilistic  forecasts  comparable  to  those  of  the  classical 
Markov  method  but  without  the  necessity  of  preparing  a Markov  transition  matrix  has  been  proposed 
by  Miller  (1968)  and  used  In  the  forecasting  experiment  described  In  chapter  10  of  this  report. 

Because  Miller's  method  Is  based  on  the  Markov  requirement  for  dependence  of  the  current  trial  only 
on  the  Immediately  preceding  trial,  because  Che  predictions  turn  out  to  be  comparable,  and  because 
the  proposed  method  also  involves  Caking  powers  of  a matrix,  we  will  refer  to  Miller's  technique 
as  an  equivalent  Harkov  model.  This  method  appears  to  retain  much  of  the  beauty  and  simplicity  of 
the  classical  finite  Markov  chain  while  avoiding  Che  overwhelming  complexity  occasioned  by  the  i 

n-dlmensionallcy  in  real  world  observations  of  the  state  of  physical  aystems. 

Consider,  as  before,  that  the  present  and  future  weather  ran  be  described  In  terms  of  C 
classes  of  dummy  variables  Zp-  The  cth  class  contains  n^  dummies,  and 

C 

P ■ Tnc 

c»l 
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the  total  number  of  dummies.  For  mathematical  convenience,  an  extra  dummy  variable  Zp  Is  Included 
In  the  observation  and  forecast  vectors  ?. * Thus, 


^ " [*o  *1  *2  •••  *p  •••  *p.i  *p} 


Regression  estimation  of  event  probabilities  (KEEP,  see  chapters  of  this  report)  makes 
it  possible  to  estimate  the  likelihood  of  occurrence  of  condition  Zp  at  time  t,  given  the  observed 
conditions  at  time  to*  i>e.. 


'’•■(*p,t=ll*to) 


by  means  of  regression  equation  of  the  form. 


Pr(zp 


b ^ + b ,2, 

p,0  0,t  p, 1 1 , t 

o c 


b _2. 

Pf2  2,t 


+ b z 
PfP 


P^t 


o 


where  the  coefficients  bp  p are  determined  by  a least  squares  technique  and  where  the  dummy 
variables  Zp  can  be  reduced  In  number  by  application  of  a method  such  as  screening  regression. 

In  the  equivalent  Markov  approach,  there  must  be  as  many  predictands  and  prediction  equations 
as  there  are  predictors.  Including  zq*  which  is  always  unity  both  as  a predictor  and  as  a predictand. 
Thus,  a system  of  multiple  linear  regression  equations  of  the  KEEP  variety  emerges  as  follows: 


Pr(zo,t  = ll^to>  " *>0,0*0,10  + *>0, 1^1, to  + *>0,2*2,10  * 

- bj  oz^^to  + *>1,1*1, to  *>1,2*2, to 
•’‘•(*2,t='l  l*to>  ■ *>2,0*0, to  ^ *>2,1*1, to  ^ *>2,2*2, to  ^ 


+ bo  pzp 
+ bi^pzp 
+ b2,p*p,to 


**^(*P.t=l  l*to>  * *>P,0*0,to  *>P,l*l,to  *>P.2*2,to 


+ b 


P,P*P.tc 


In  the  example  above  involving  29  classes  of  ceiling,  visibility,  wind  and  altimeter  setting, 

P “ 29,  and  there  are  30  equations  in  30  terms.**  These  equations  form  one  system  of  equations, 

not  four  separate  systems  for  the  four  variable  categories.  Binding  a variety  of  predictors  and 
predictands  together  In  a single.  Jointly  determined  system  gives  the  prediction  scheme  greater 
skill  than  would  be  attainable  using  separate  systems  of  equations. 

This  system  of  KEEP  equations  can  conveniently  be  represented  In  matrix  form  as 

« - 

where"?  is  the  (P+l)-l  column  vector  of  probabilities  of  Zq,  zi,  Z2  ...  Zp  being  "on"  at  time 
t ■ to  Is  the  similarly  dimensioned  column  vector  of  observations  Zg,  Zj,  Z2  ...  Zp  ®f 

time  tp.  B Is  the  (P+1)  • (P+1)  matrix  of  KEEP  coefficients  and  represents  a probability  generating 
function: 


**Q  is  always  unity  In  any  observation  or  forecast. 

**The  added  term  and  equation  is  due  to  Zq.  b^  ^ must  be  unity,  and  all  other  bj,  terms  must  be 
zero.  ’ 
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'0,0 

’’0,1 

^>0,2  • 

••  •’0,? 

l.O 

‘’l.l 

•’1,2  • 

..  bj  p 

2,0 

>’2,1 

*’2.2 

b2^p 

P,0 

*’P,1 

bp  2 .. 

'•  •’P,P 

To  obtain  the  forecast  probabilities  R of  the  state  of  the  system  at  time  t ■ to  + ^t, 

we  need  only  postmultlpjy  the  matrix  Tf  with  the  observation  vector  ^ describing  the  weather  at  time 
to-  To  obtain  a forecast  for^  time  t - to  + n At,  which  is  n time  steps  in  the  future,  one 
simply  uses  the  nth  power  of  B as  in  the  classical  Markov  technique: 


where  the  powers  can  be  obtained  by  the  eigenvalue  technique  discussed  above. 

It  should  be  noted  that  although  powers  of  B are  used  to  obtain  the  forecast  probabilities  If 
for  several  time  steps  in  the  future,  the  matrix  B is  nevertheless  not  equal  to  the  Markov  transi- 
t ion  matrix  ?.  It  is  rather  the  case  that  the  forecast  probabilities  ? are  comparable  to  those 

produced  by  the  classical  Markov  method.  For  a prediction  scheme  in  one  category,  the 
forecast  probabilities  are  not  just  comparable;  they  are  Identical: 

q r"  - B''  o 

We  have  made  the  point  that  equivalent  Markov  forecasts  are  not  in  general  Identical  to  classi- 
cal Markov  forecasts,  and  we  have  used  the  term  "comparable"  to  describe  the  agreement  between  the 
two  methods.  Why  don*c  the  methods  agree  exactly?  The  reason  is  that  no  100  x 100  matrix  of  KEEP 
coefficients  B can  be  expected  to  reproduce  fully  all  the  non-linear  atmospheric  state  change 
relationships  embodied  in  a 10  billion  x 10  billion  Markov  transition  matrix  ?.  The  KEEP  equations 
in  effect  linearize  the  prediction  scheme  by  neglecting  the  nonlinear  Boolean  combinations  of 
dummy  variables  that  constitute  most  of  the  Markov  states.  If  every  such  Boolean  combination  were 
resurrected  and  made  use  of  as  a predictor  in  the  equivalent  Markov  scheme,  then  the  latter  would 
produce  forecasts  in  exact  agreement  with  those  of  the  classical  Markov  model.  Of  course,  at 
that  point  the  equivalent  Markov  scheme  would  be  just  as  unwieldy  in  a practical  forecast  situation 
as  the  classical  Markov  technique  it  is  intended  to  replace.  In  practice,  since  the  equivalent 
Markov  model  is  quite  skillful  in  forecasting  the  weather  (see  chapter  10  ),  it  is  apparent  that 
linearizing  the  prediction  scheme  does  not  unacceptably  impair  the  prognostic  performance  of  the 
mode  1 . 

One  special  case  exists  in  which  the  predictions  of  the  equivalent  Markov  model  agree  exactly 
with  those  of  the  classical  technique.  This  is  the  case  where  the  prediction  scheme  includes  only 
one  variable  category  (e.g.,  ceiling  alone  or  visibility  alone,  with  no  "additional"  predictors). 

In  this  case,  each  Markov  state  of  the  classical  technique  corresponds  to  exactly  one  binary 
variable  of  the  equivalent  Markov  model  and  the  classical  Markov  model. 

The  equivalent  Markov  model  has  a number  of  features  of  special  interest  that  make  it  attrac- 
tive from  the  point  of  view  of  practical  prediction: 

- When  using  the  equivalent  Markov  method,  it  is  unnecessary  to  prepare  an 
explicit  Markov  transition  probability  matrix,  which  Is  often  impracticably 
large  In  real  world  prediction  problems. 

- The  probability  generating  function  ^ when  subjected  to  eigenvalue  analysis, 
reveals  the  component  of  the  forecast  probability  vector  due  to  climatology 

and  the  components  due  to  higher  order,  mean-departure  influences.  The  contribution  of 
climatology  relative  to  the  other  terms  is  always  readily  obtainable  in  terms 
of  ratios  of  powers  of  normalized  eigenvalues.  Convergence  of  the  Markov 
process  toward  the  steady  climatological  state  is  readily  apparent  through  the 
eigenvalue  analysis. 

- The  model  can  accomcnodate  uncertainty  in  specification  of  the  Initial  state. 

As  Is  the  case  with  the  classical  Markov  method,  one  simply  uses  a probabilistic 
initial  vector  Instead  of  a deterministic  one  when  conditions  are  not  known 
exactly.  Thus,  missing  observations  do  not  cripple  the  prediction  scheme. 
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- Through  the  method  of  generalized  statistical  operators,  discussed  below,  a single 
equivalent  Markov  model  can  be  made  applicable  to  large  regions  or  even  to  the 

whole  globe,  making  It  unnecessary  to  prepare  probability  generating  matrices  B for  every 
weather  station.  Network  operators  can  also  be  used  if  necessary. 

- Non-linear  Boolean  combinations  of  predlctors/predictands  as  well  as  time  lags 

can  be  incorporated  in  the  model.  The  time  lag  feature  makes  It  possible  in  part  to 
circumvent  the  classical  Markov  rot)del*s  independence  of  the  outcome  of  trials  earlier  than 
that  immediately  preceding. 

- With  wise  choice  of  predictors  and  predlctands,  the  equivalent  Markov  model  can  be 
made  to  accommodate  asynoptic  data  such  as  pilot  reports,  radar  weather  observations 
and  satellite  information.  The  model  can  be  run  at  any  time  a forecast  is  needed, 

- The  skill  of  the  prediction  scheme  can  be  Improved  by  incorporation  of  predictors 
beyond  the  minimum  set  dictated  by  the  requirement  that  each  predlctand  must  also 
serve  as  a predictor.  Naturally,  when  "additional"  predictors  are  made  use  of, 
corresponding  "additional"  predlctands  appear.  In  practice,  this  is  handled  either 
by  disregarding  the  superfluous  predictions  or  by  abbreviating  the  forecast 

algorithm  (matrix  multiplication)  such  that  the  unneeded  predictions  are  simply  not  made. 

- The  model  is  well  suited  for  the  use  of  a special  class  of  predictors  referred  to  as 
model  output  statistics  (MOS).  Typically,  MOS  predictors  are  produced  by  numerical 
weather  prediction  models  and  represent  forecast  values  of  various  atmospheric  parameters 
applicable  at  some  future  time.  Examples  useful  in  visibility  forecasting,  to  consider 
one  application.  Include  vertical  motion  at  850  mb  and  winds  at  the  top  of  the  Ekman  layer. 
Under  the  "imperfect  prog"  or  MOS-approach  to  forecasting,  statistical  relations  are 
developed  between  these  MOS  predictors  and  the  occurrence  of  the  weather  elements 

being  forecast,  such  as  ceiling  and  vislbll  ty.  The  equivalent  Markov  model  can  handle 
both  MOS  predictors  (imperfect  prog  appro'  , and  observed  predictors  (perfect  prog 
approach)  in  the  same  model  equations.  1 the  MOS  predictors  selected  are  appropriate  to 
the  prediction  task  at  hand,  it  can  be  expected  that  the  skill  of  th?  statistical  scheme 
will  ride  along  on  the  skill  of  the  dynamical  model. 


1 2 . Equivalent  Markov  and  Classical  Markov  Models:  A Comparative  Ex.jmple . 

It  is  not  obvious  that  the  equivalent  Markov  model  presented  above  is  In  fact  equivalent  to 
the  classical  finite  Markov  chain  model  given  earlier.  Rather  than  to  prove  the  equivalency 
mathematically,  let  us  demonstrate  it  by  means  of  an  example.  While  less  rigorous  than  a proof,  the 
example  will  highlight  certain  computational  methods  useful  in  application  of  the  equivalent 
Markov  technique. 

Consider  the  problem  of  forecasting  the  cloud  celling  at  Kelly  AFB,  Texas.  For  simplicity,  we 
will  pose  this  problem  in  terms  of  one  predlctor/predlctand  category,  the  ceiling  Itself.  We 
will  subdivide  the  one  category  into  three  classes,  Zj,  Z2  s"**  ^3*  follows: 

Table  1 

DUMMY  VARIABLES  AND  MARKOV  STATES 
FOR  CEILING  HEIGHT  PROBLEM 

Dunwy  Variable  Definition Markov  State 

*1  0 ft<  CIC  < 3,000  ft  I 

*2  3,000  ft*CIG<15,000  ft  2 

*3  15,000  ft*CIG  3 


Because  this  problem  has  only  one  category  of  predlctor/predlctand,  the  Markov  states  exactly 
parallel  the  dummy  variables.  This  convenient  property  does  not  hold  true  for  problems  with  more 
than  one  predlctor/predlctand,  and  the  lack  of  correspondence  between  dummies  and  states  makes  more 
complex  any  comparison  between  equivalent  Markov  and  classical  Markov  techniques.  The  classical 
Markov  method  forecasts,  for  example,  the  probability  of  occurrence  of  Markov  state  12,  which  is 
a particular  combination  of  dummy  variables  being  "on,"  say  Z3,  Z4  and  To  get  the 

probability  of  one  such  dummy,  say  z^,  being  on,  we  would  have  to  add  the  classical  Markov  proba- 
bility forecasts  for  any  Markov  states  In  which  z^  is  categorized  as  "on."  Only  then  would  we 
have  a forecast  probability  of  Z4  for  comparison  with  the  equivalent  Markov  forecast.  It  Is  to 
avoid  this  final  step  of  adding  the  classical  Markov  predictions  that  we  have  limited  the  number 
of  variable  categories  in  this  example  to  one. 
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To  permit  the  reader  to  verify  these  results  by  hand,  the  data  base  for  this  example  has  been 
limited  to  A8  hours  of  actual  celling  observations  for  Kelly  AFB,  TX,  during  the  period  11-12 
April  1950.  These  are  reproduced  in  Table  2.  In  practice,  such  a data  base  Is  much  too  small. 

In  general,  periods  of  record  four  orders  of  magnitude  longer  than  this  are  used  to  Insure  robust- 
ness of  the  prediction  scheme  when  applied  to  Independent  data. 


Table  2 


48  HOURS  OP  OBSERVATIONS  OP  CLOUD  CEILING  AT  KELLY  APB,  TX 
11-12  April  1950 


Date/Tlme  (L)  Celling  (ft)  Markov  State 


11/00 

800 

1 

01 

800 

1 

02 

800 

1 

03 

2,500 

1 

04 

2,600 

1 

05 

1,700 

1 

06 

2,300 

1 

07 

3,000 

2 

08 

3,000 

2 

09 

4,500 

2 

10 

25,000 

3 

11 

25,000 

3 

12 

25,000 

3 

13 

25,000 

3 

14 

None 

3 

15 

None 

3 

16 

None 

3 

17 

None 

3 

18 

None 

3 

19 

25,000 

3 

20 

25,000 

3 

21 

5,000 

2 

22 

4,500 

2 

23 

4,000 

2 

12/00 

10,000 

2 

01 

11,000 

2 

02 

900 

1 

03 

800 

1 

04 

7,000 

2 

05 

1,000 

1 

06 

7,000 

2 

07 

500 

1 

08 

600 

1 

09 

7,000 

2 

10 

1,900 

1 

11 

1,900 

1 

12 

25,000 

3 

13 

25,000 

3 

14 

25,000 

3 

15 

25,000 

3 

16 

25.000 

3 

17 

25,000 

3 

18 

25,000 

3 

19 

25,000 

3 

20 

25,000 

3 

21 

25,000 

3 

22 

25,000 

3 

23 

1,900 

1 

13/00 

1,600 

1 

Explicit  counting  of  Markov  state  transitions  in  the  data  of  Table  2 gives  rise  to  the  classical 
Markov  transition  probability  matrix  P shown  In  Table  3.  Note  the  dominance  of  persistence  along 
the  main  diagonal.  Note  also  for  the  data  base  selected  how  unlikely  It  is  that  bad  weather  will 
improve  (see  row  1).  In  this  case,  probably  because  of  the  limited  data  base  used,  the  Markov 
model  will  strongly  predict  deteriorating  weather,  given  that  conditions  are  already  marginal 
(see  row  2).  The  greater  dominance  of  persistence  in  good  weather  regimes  is  seen  in  row  3. 

Table  3 

MARKOV  TRANSITION  MATRIX* 

FORMED  BY  EXPLICIT  COUNTING  OF  STATE  CHANCES 


Final  Markov  State 
1 2 3 


Initial 

Markov 

State 


1 

0.6667 

0.2667 

0.0667 

(10) 

( 4) 

( 1) 

2 

0.3636 

0.5455 

0.0909 

( ^) 

( 6) 

( 1) 

3 

0.0455 

0.0455 

0.9091 

( 1) 

( 1) 

(20) 

1.0000 

(15) 

1 . 0000 

(11) 

1 . 0000 

(22) 


+ At  directly 
The  three  possible  forecasts,  generated 

below.  It  Is  like- 


from  the  transition  probability  matrix  P in  Table  3. 
from  the  three  possible  initial  states,  are  shown  in  the  first  part  of  Table  <4 
wise  possible  to  make  a 6-hour  fore< 


- 


for  time 

+ 6 A t 

0.4082 

0.2924 

0.2994 

0.3987 

0.2871 

0.3142 

0.2041 

0.1571 

0.6388 

The  three  possible  6-hour  forecasts  are  also  shown  in  Table  4: 

Table  4 


CLASSICAL  MARKOV  FORECASTS 


Time 

Initial  State 

Probability  of  Pinal  State 

1 

2 

3 

1 

2 

3 

1-hour 

1 

0 

0 

0.6667 

0.2667 

0.0667 

0 

1 

0 

0.3636 

0.5455 

0.0909 

0 

0 

1 

0.0455 

0.0455 

0.9091 

6-hour 

1 

0 

0 

0.4082 

0.2924 

0.2994 

0 

1 

0 

0.3987 

0.2871 

0.3142 

0 

0 

1 

0.2041 

0.1571 

0.6388 

The  question  Is,  can  these  probabilistic  forecasts  be  reproduced  by  the  equivalent  Markov 
model  presented  above?  To  see  whether  this  Is  possible,  let  us  first  transform  our  data  into 
binary,  or  so  called  "dummy"  variables.  Including  a z^,  variable  that  Is  always  on.  Note  that  the 
final  state  for  the  nth  observation  also  serves  as  the  Initial  state  for  the  (n+l)th  observation. 
The  data  expressed  in  terms  of  dummy  variables  are  shown  in  Table  5. 


♦Transition  probabilities  are  shown  as  fractional  values, 
parentheses. 


Counts  are  shown  as  Integers  in 


Table  5 


MARKOV  ANALYSIS  IN  TERMS  OF  DUMMY  VARIABLES 
SUITABLE  FOR  USE  IN  MULTIPLE  LINEAR  REGRESSION 


Observation 

Initial  State  (to) 

Final  State  (t) 

N 

0 

« 

0 

’'l.to 

*2. to 

’■•3.  to 

*o.t 

’■•l.t 

*2,t 

’=3.t 

1 

1 

1 

0 

0 

1 

1 

0 

0 

2 

1 

1 

0 

0 

1 

1 

0 

0 

3 

1 

1 

0 

0 

1 

1 

0 

0 

4 

1 

1 

0 

0 

1 

1 

0 

0 

S 

1 

1 

0 

0 

1 

1 

0 

0 

6 

1 

1 

0 

0 

1 

1 

0 

0 

7 

1 

1 

0 

0 

1 

0 

1 

0 

8 

1 

0 

1 

0 

1 

0 

1 

0 

9 

1 

0 

1 

0 

1 

0 

1 

0 

10 

1 

0 

1 

0 

I 

0 

0 

1 

II 

1 

0 

0 

1 

1 

0 

0 

1 

12 

1 

0 

0 

1 

1 

0 

0 

1 

13 

1 

0 

0 

1 

1 

0 

0 

1 

14 

1 

0 

0 

1 

1 

0 

0 

1 

15 

1 

0 

0 

1 

1 

0 

0 

1 

16 

1 

0 

0 

1 

1 

0 

0 

1 

17 

1 

0 

0 

1 

1 

0 

0 

1 

18 

1 

0 

0 

1 

1 

0 

0 

1 

19 

1 

0 

0 

1 

1 

0 

0 

1 

20 

1 

0 

0 

1 

1 

0 

0 

1 

21 

1 

0 

0 

1 

1 

0 

1 

0 

22 

1 

0 

1 

0 

1 

0 

1 

0 

23 

1 

0 

1 

0 

1 

0 

1 

0 

24 

1 

0 

1 

0 

1 

0 

1 

0 

25 

1 

0 

1 

0 

1 

0 

1 

0 

26 

1 

0 

1 

0 

1 

1 

0 

0 

27 

1 

1 

0 

0 

1 

1 

0 

0 

28 

1 

1 

0 

0 

1 

0 

1 

0 

29 

1 

0 

1 

0 

1 

1 

0 

0 

30 

1 

1 

0 

0 

1 

0 

1 

0 

31 

1 

0 

1 

0 

1 

1 

0 

0 

32 

1 

1 

0 

0 

1 

1 

0 

0 

33 

1 

1 

0 

0 

1 

0 

1 

0 

34 

1 

0 

1 

0 

1 

1 

0 

0 

35 

1 

1 

0 

0 

1 

1 

0 

0 

36 

1 

1 

0 

0 

1 

0 

0 

1 

37 

1 

0 

0 

1 

1 

0 

0 

1 

38 

1 

0 

0 

1 

1 

0 

0 

1 

39 

1 

0 

n 

1 

1 

0 

0 

1 

40 

1 

0 

0 

1 

1 

0 

0 

1 

41 

1 

0 

0 

1 

1 

0 

0 

1 

42 

1 

0 

0 

1 

1 

0 

0 

1 

43 

1 

0 

0 

1 

1 

0 

0 

1 

44 

1 

0 

0 

1 

1 

0 

0 

1 

45 

1 

0 

0 

1 

1 

0 

0 

1 

46 

1 

0 

0 

1 

1 

0 

0 

1 

47 

1 

0 

0 

1 

1 

1 

0 

0 

48 

1 

1 

0 

0 

1 

1 

0 

0 

From  Table  5,  the  binary  nature  of  the  dunmy  variables  makes  It  exceedingly  simple  to  obtain 
a SUB  of  the  squares  and  cross  products  matrix,  such  as  that  below.  Such  matrices  are  also  called 
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SSCP  matrices.  Note  that  this  matrix  is  symmetrical.  It 
or  lower  triangular  elements  even  though  this  may  seem  to 


is  economical  to  retain  only  the  upper 
make  the  programming  more  complex. 


2^1. to 

2:*2.to 

1*3 , to 

tyt 

^"l.to^ 

^*l,to^2,to 

1*1 » to*3 , to 

Zyt*i 

2^*2. to 

^M,to*2,to 

r*2,to*3,to 

Zyt*2 

2^*3. to 

I*l,to*3,to 

?*2.to*3.to 

l*3.to^ 

Iyt*3 

^*l.to’'t 

2*2.toyt 

i*3.toyt 

In  this  notation,  the  second  subscript  on  the  z's,  being  t^,  emphasizes  the  point  that  this  is 
predictor  information,  available  at  time  to*  The  variable  y represents  the  predictand,  valid  at 
time  t.  A y is  used  instead  of  z because  the  y stands  for  any  of  the  three  predictand  z's: 


y » or  Z2  or  Z3 

There  accordingly  exist  three  y-rows  for  the  SSCP  matrix  Any  one  matrix  can  accommodate  only 
one  y-row,  so  in  fact  there  are  three  SSCP  matrices,  ?2  ^3»  each  having  the  same  first 

four  rows  and  a different  y-row.  Each  matrix  Twill  serve  indirectly  as  the  basis  for  development 
of  one  KEEP  equation.  In  computational  practice,  the  duplication  among  the  three  matrices  T 
makes  it  unnecessary  to  carry  all  three  of  them,  but  we  show  them  as  distinct  in  Table  6 for 
illustrative  purposes. 


Table  6 


SSCP  MATRICES  f AND  LEFT -OUT  MATRICES  t 
FOR  EQUIVALENT  MARKOV  PROBLEM 

(Prepared  from  Data  in  Table  5) 


48 

15 

11 

22 

15 

48 

15 

11 

15 

15 

15 

0 

0 

10 

15 

15 

0 

10 

11 

0 

11 

0 

4 

M - 

11 

0 

11 

4 

22 

0 

0 

22 

1 

15 

10 

4 

15 

15 

10 

4 

1 

15 

48 

15 

11 

22 

11' 

'48 

15 

11 

ll' 

15 

15 

0 

0 

4 

15 

15 

0 

4 

11 

0 

11 

0 

6 

12  - 

11 

0 

11 

6 

22 

0 

0 

22 

1 

11 

4 

6 

11. 

11 

4 

6 

1 

11_ 

48 

15 

11 

22 

22 

■48 

15 

11 

22" 

15 

15 

0 

0 

1 

15 

15 

0 

1 

11 

0 

11 

0 

1 

L,  - 

11 

0 

11 

1 

22 

0 

0 

22 

20 

22 

1 

1 

22 

22 

1 

1 

20 

22 

The  REEP  equations  cannot  be  derived  directly  from  the  SSCP  matrices  S because  the  use  of 
mutually  exclusive  and  exhaustive  dummy  variables  gives  rise  to  redundant  rows  and  colunms  in  such 
matrices.  For  example,  in  the  present  case,  all  the  information  about  the  state  of  variable  Z3  is 
given  by  the  states  of  and  Z2.  We  must  remove  from  each  matrix  S one  redundant  row  and  column 
for  each  category  of  variable.  In  the  present  example,  there  Is  only  one  variable  category,  the 
ceiling;  thus,  we  must  delete  one  row  and  one  column.  Arbitrarily,  we  select  the  row  and  column 
corresponding  to  Z3  for  deletion.  The  reduced  matrices  L in  *’left-out  variable”  form  are  also 
shown  in  Table  6. 
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► 


^ ScatisClcs  texts  such  as  Tatsuoka  (1971)  present  the  matrix  formulation  of  multiple  linear 

regression  problems.  In  the  present  case,  we  require  only  two  regression  equations  In  left-out 
variable  form: 

Pr(zi  t = l|*to>  " »l,o*o,to  ^1.2*2,10 

* »2,o*o.to  + ■•'2. 1*1. to  + *2, 2*2, to 

where  the  regression  equations  are  given  by  elements  of  the  Crout  auxiliary  matrix  ^ 


1,1 

•*1,2 

‘*1.3 

“1.4 

2,1 

‘*2.2 

‘*2,3 

‘*2,4 

‘3.1 

•^3.2 

“3.3 

“3.4 

4.1 

‘*4,2 

“4.3 

“4.4 

whose  creation  is  discussed  in  chapter  2 above.  In  this  case. 


«I 

**2 

<13 

“l 

“l 

“2 

<13 

‘>2 

‘>2 

•*3 

di 


11 

*11 

21 

*21 

31 

*31 

41 

*41 

12 

‘*21 

/ 

‘*11 

13 

“31 

/ 

“ll 

14 

“41 

/ 

“ll 

22 

*22 

- 

“l2“2l 

32 

*32 

- 

“l2“31 

42 

*42 

- 

‘*12‘*41 

23 

‘*32 

/ 

‘*22 

24 

“42 

/ 

**22 

33 

*33 

- 

‘*13^31 

- d23d32 

43 

*43 

- 

“l3‘*4l 

■ “23“42 

34 

“43 

/ 

‘*33 

44 

*44 

- 

‘*41‘*14 

- d42‘*24 
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and  the  regression  coefficients  are 


1.2 

” ‘'3,4 

1,1 

* ‘*2,4 

■ ‘*2,3®1.2 

1,0 

" ‘*1.4 

- ‘*1,3''1.2 

We  shall  show  the  computation  of  the  regression  coefficients  ajj  for  the  first  "left-out" 
regression  equation  and  allow  the  reader  to  supply  corresponding  detail  ^or  the  other  two  equations. 
The  Crout  auxiliary  matrix  Dj  corresponding  to  the  left-out  SSCP  matrix  Lj  is 


48.0000 

0.3125 

0.2292 

0.3125^ 

15.0000 

10.3125 

-0.3333 

0.5152 

U.OOOO 

-3.4375 

7.3333 

0.3182 

15.0000 

5.3125 

2.3333 

6.8333 

The  left-out  regression  coefficients  are  then 

aj  0 • 0.04545  aji  - 0.6212  aj^  ' 0.3182 


Once  the  other  regression  coefficients  have  been  computed,  a left-out  matrix  of  coefficients 
A can  be  formed  as  follows: 


®0,0 

*0,1 

*0.2' 

1 . 0000 

0.0000 

o.oooo' 

*1,0 

*1,1 

*1,2 

m 

0.0455 

0.6212 

0.3182 

*2,0 

*2,1 

“2,2 

0.0455 

0.2212 

0.5000 

Note  that  the  first  row  is  supplied  to  accommodate  the  trivial  prediction  equation. 


This  first,  or  "zero"  row  must  always  consist  of  a one  In  the  first  element  and  zeroes  in  the 
remaining  elements  because  z^  ^ and  z^  are  both  always  one. 


Forecasts  in  the  form 


*’«•(*<>,  fll^to) 

"0,0  *0,1  *0,2 

■*o.t„; 

- 

"1,0  *1,1  "1,2 

M,t„ 

1 

Pr(*2,t-l!*to)^ 

■2,0  "2,1  "2,2 

1 

.*2,t„. 

7-22 


can  be  made  from  these  equations  as  they  stand. 


For  example,  for  the  Initial  conditions 


p" 


0 


X ( = 1 z,  f - 0 ^2  t ■ 

the  left-out  form  of  the  equation  predicts 

Pr(xo,t)  ” 1.0000  + 0.0000  h 0.0000  - 1.0000 

Pr{xi^(.)  =■  0.0455  + 0.0000  + 0.3182  =■  0.3636 

Pr(x2,t)  ■ 0.0455  + 0.0000  0.5000  - 0.5455 

which  Is  the  same  as  the  second  line  of  Table  4.  We  can  obtain  Pr  (z^  by  subtraction: 

Pr(z3)  = 1 - Pr(zi)  - Pr(z2) 


This  "left-out”  form  of  the  KEEP  equations  Is  inconvenient,  in  that  it  does  not  directly 
produce  a forecast  for  the  left-out  dummy  variables  except  by  subtraction.  Moreover,  it  is  not 
amenable  to  .subs€H|uent  eigenvalue  analysis.  Ordinarily,  therefore,  one  subjects  the  left-out 
matrix  A to  a procedure  known  as  PLODITE  (putting  the  left-out  dummies  in  the  equation).  The 
PLODITE  algorithm  produces  a matrix  of  regression  coefficients  IT  that  includes  a row  and  column 
corresponding  to  each  left-out  dummy  variable. 


A general  PEOOITE  algorithm  in  the  form  of  the  FORTRAN  program  PLDT  is  provided  in  appendix 
A.  We  can  describe  that  algorithm  as  it  applies  to  the  problem  of  treating  our  3x3  matrix  X. 


The  first  step  is  to  identify  the  left-out  variables  and  the  rows  and  columns  of  iT  correspond- 
ing to  them.  In  the  present  case,  Z3  has  been  left  out,  so  we  must  supply  a new  column  ^ and 
row  b-j^j  in  the  matrix  F of  REEP  coefficients: 


0,0 

^0,1 

’’0,2 

’’0,3 

I.O 

‘’1,1 

‘’1.2 

’’1,3 

2,0 

‘’2,1 

‘’2.2 

’’2,3 

3,0 

‘’3,1 

CM 

.0 

‘’3,3. 

The  "zero  row"  of  B is  trivial;  It  contains  a one  in  the  first  element  and  zeroes  in  the  others. 
The  next  two  rows  use  the  algorithm, 


1,3 

M 

-'’i.ei.to 

■ "l  ,2’'2,to 

for 

1 

> 0 

1 1 ,2 

. - ’1.3 

m 

-»l,l 

‘l.l 

- "1,2 

l.l 

m 

Q 

1,2 

m 

"1,2  '5 

for 

1 

> 0 

1.1 

m 

"1,1  1-  Q 

for 

1 

> 0 

1,0 

for 

1 

> 0 

where  ifl  the  mean  of  the  Itii  pr^dl^^tand  and  can  he  obtained  1^4 
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can  be  made  from  these  equations  as  they  stand.  For  example,  for  the  initial  conditions 


*o.to  " 1 


the  left-out  form  of  the  equation  predicts 


*i  t " ® 


'2. to  - ‘ 


Pr(zo,t)  “ 1-0000  + 0.0000  + 0.0000  - 1.0000 

PrCzi^t)  " 0.0455  -t-  0.0000  + 0.3182  - 0.3636 

Pr(z2,t)  " 0.0455  + 0.0000  + 0.5000  - 0.5455 

which  is  the  same  as  the  second  line  of  Table  4.  We  can  obtain  Pr  (Zj  by  subtraction: 

Pr(z3)  . 1 _ Pr(zi)  - Pr(Z2) 

This  "left-out"  form  of  the  KEEP  equations  is  inconvenient,  in  that  it  does  not  directly 
produce  a forecast  for  the  left-out  dummy  variables  except  by  subtraction.  Moreover,  it  is  not 
amenably  to  subsequent  eigenvalue  analysis.  Ordinarily,  therefore,  one  subjects  the  left-out 
matrix  A to  a procedure  known  as  PLODITE  (putting  the  left-out  dummies  in  the  equation).  The 
PLODITE  algorithm  produces  a matrix  of  regression  coefficients  f that  Includes  a row  and  column 
corresponding  to  each  left-out  dummy  variable. 

A general  PLODITE  algorithm  in  the  form  of  the  FORTRAN  program  PLOT  is  provided  in  appendix 
A.  We  can  describe  that  algorithm  as  it  applies  to  the  problem  of  treating  our  3x3  matrix  A. 

The  first  step  is  to  identify  the  left-out  variables  and  the  rows  and  columns  of  ? correspond- 
ing to  them.  In  the  present  case,  Z3  has  been  left  out,  so  we  must  supply  a new  column  b.  j and 
row  bj^j  in  the  matrix  ? of  REEP  coefficients:  ’ 


'*0,0  '’0,1  ‘*0,2  **0,3 

'*1,0  ‘»1,1  ‘>1,2  *>1,3 

•’2,0  ’>2,1  *>2,2  *>2,3 

*>3,0  *>3,1  *>3,2  ‘>3,3. 


The  "zero  row"  of  B is  trivial;  It  contains  a one  in  the  first  element  snd  zeroes  in  the  others. 
The  next  two  rows  use  the  algorithm. 


P 


•i, 1*1, to  - •l,2*2,tc 


for  1 > 0 


-»l,l  - •1,2 


■ *1,2  + ^ 
" "1,1  Q 


for  t > 0 


for  1 >0 
for  1 >0 


where  is  the  mean  of  the  1th  predlctand  and  can  be  obtained  from  the  flrat  element  of  the 


•o'  t ,C. 


final  row  of  the  left-out  SSCP  aatrlx  Lj  by  dividing  the  eleaent  by  the  nuober  of  observations  H. 
The  final  row  of  ? In  each  variable  category  Is  obtained  by  sunning  the  corresponding  colum 
elenents  In  the  rows  above  It,  using  the  rule  that  the  b^  q nust  add  colunnwlse  to  one  within 
each  variable  category,  while  the  other  b^^j  nust  add  to  zero.  Thus 


b 


3,0 


b 


3.J 


for  J > 0 


For  the  first  row  1 ~ 1, 


**1,3  " -0.6212(15/48)  - 0.3182(11/48) 
- -0.2670  - 0 


*>1,2  * 0.3182  - 0.2670  - 0.0511 
bi^l  - 0.6212  - 0.2670  - 0.3542 
bio  - (15/48)  - 0.3125 

When  the  sane  algorithm  Is  applied  to  the  row  1 • 2,  the  results  are 

•>2,3  “ -0.1837 
•>2,2  " 0.3163 
b2,i  - 0.0375 
•>2,0  ■ 0.2292 


by  suMatlon  we  obtain  the  row  1 • 3: 


1 

r 


1 

■ . 


•>3,3  ■ 0.4508 
bj  2 ■ -n.3674 

•>3,1  * -0.3917 
•>3,0  ■ 0.4583 


The  conplete  natrlx  Is 


1.0000 

0.0000 

0.0000 

0.0000 

0.3125 

0.3542 

0.0511 

-0.2670 

0.2292 

0.0375 

0.3163 

-0.1837 

0.4583 

-0.3917 

-0.3674 

0.4508 

This 

forecast 


la  tha  natrlx  of  REEF  coefficients  T used  In  the  equivalent  Markov  nodal  to  produce  a 
probability  vector  It  fron  the  vector  of  Initial  conditions  ?i 


f - 


Oslng  T and  Its  sixth  power, 
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1.0000 

0.0000 

0.0000 

0.0000 

0.3125 

0.0957 

0.0862 

-0.1084 

0.2292 

0.0632 

0.0579 

-0.0721 

p.A583 

-0.1590 

-0.1441 

0.1804 

liour  (n  • 

1)  and  6 

hours  (n 

- 6). 

we  can  prepare  tv.  .icasts  for  1 hour  (n  • 1)  and  6 hours  (n  ■ 6).  These  equivalent  Harkov  fore- 
casts are  shown  in  Table  7,  which  Is  in  the  same  form  as  Table  k containing  the  classical  Markov 
forecasts.  Note  that  the  results  are  identical,  demonstrating  the  equivalence  of  the  REEF 
technique  and  the  classical  method. 

Table  7 


I 


EQUIVALENT  HARKOV  FORECASTS 


Initial  State  Probability  of  Pinal  State 


Time 

0 

1 

2 

3 

0 

1 

2 

3 

1 - hour 

1 

1 

0 

0 

1 

0.6667 

0.2667 

0.0667 

1 

0 

1 

0 

1 

0.3636 

0.5455 

0.0909 

1 

0 

0 

1 

1 

0.0455 

0.0455 

0.9091 

6 - hour 

1 

1 

0 

0 

1 

0.4082 

0.2924 

0.2944 

1 

0 

1 

0 

1 

0.3987 

0.2871 

0.3142 

1 

0 

0 

1 

1 

0.2041 

0.1571 

0.6388 

In  this  section,  we  have  demonstrated  how  an  equivalent  Markov  probability  generating  function 
in  the  form  of  a REEF  matrix  of  regression  coefficients  ? can  be  created  from  historical  weather 
data  and  applied  to  a practical  forecasting  problem.  A more  extensive  application  of  this  tech- 
nique is  given  in  chapter  10.  Our  analysis  is  not  complete  at  this  point,  however.  The 
probability  generating  function  T is  rich  in  Information  regarding  how  and  under  what  conditions 
the  weather  changes,  the  degree  to  which  it  is  predictable  for  different  forecast  periods,  how 
strongly  climatology  contributes  to  the  forecast  probabilities  for  various  forecast  lengths,  and 
the  rate  at  which  these  forecast  probabilities  decay  toward  the  vector  of  climatology.  A part  of 
this  information  can  be  elicited  from  B by  eigenvalue  analysis  of  the  type  discussed  above  for 
obtaining  powers  of  a matrix.  In  the  following  section,  we  will  analyse  the  matrix  7 developed 
from  this  example.  Our  analysis  will  be  by  the  method  of  eigenvalues. 

13.  Hierachical  Matrix  Analysis  of  the  Equivalent  Markov  Model. 


The  Markov  chains  encountered  in  practical  weather  forecasting  problems  are  completely 
ergodic  in  the  sense  that  their  limiting  state  probability  distributions  are  Independent  of  the 
initial  conditions.  In  section  8 above,  dealing  with  the  classical  Markov  model,  we  saw  this  as 
a result  of  the  fact  that  Markov  transition  matrices  for  weather  forecasting  problems  are  regular, 
bringing  about  a stationary  distribution  of  the  Markov  chain  after  sufficiently  many  steps  n. 

The  forecast  probabilities  produced  by  the  Markov  model  will,  with  increasing  length  of  the  fore- 
cast period,  converge  toward  the  steady  state  vector  of  a priori  climatological  probabilities  as 
the  system  gradually  "forgets"  its  initial  state. 

Convergence  toward  the  steady  climatological  state,  an  attractive  feature  of  the  classical 
Harkov  model,  is  also  seen  in  the  equivalent  Markov  model.  Although  the  matrix  7 of  REEF  coeffi- 
cients is  not  regular,  nevertheless  increasing  powers  of  ? do  converge  to  the  zero-order  hierarchi- 
cal form  ^ having  as  its  first  column  the  climatological  expectation  of  each  of  the  predictor 
dum^r  variables  and  having  zero  in  all  other  positions: 


?o 


0 0 0 ...  0 

0 0 0 ...  0 

0 0 0 ...  0 

0 0 0 . . . 0 


for  P - ■ - 1 


As  it  turns  out,  the  matrix  of  climatology  and  all  the  other  hierarchical  forma,  which 
repreaent  mean-departure  terms,  are  given  by  the  eigenvalue  method  for  obtaining  powera  of  the 
probability  generating  function  Y: 


» tI 

i:i 
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*n  _ X"  it  ^ A"  ^ it. 

■ “ -I — s-  Co  + }ar  Cl  + j ■■^'  C; 

To  *o  Ti  yj  *2 


X -• 

> ^ ^ Am*  I • 

2 + . • . -r  w a« '-m-1 

ym-i  ^-1  ^ 


Mean  Term 
"Climatology'' 


Mean-Deoarture  Terms 


where  ? » ? « ^iXi  • etc.,  and  where  m Is  the  dimensionality  of  the  m x m matrix  B.  The 

total  nuii4>er°o?  dummy  variables  P is  one  less  than  m d^e  to  the  presence  of  Zq.  Xj^  is  the  ith 
eiqenvalue,  its  associated  "left"  eigenvector,  and  y^  the  associated  "right"  eigenvector. 

The  are  the  hierarchical  matrices  of  order  i and  have  the  same  dimensions  as  the  m x m matrix  B. 
Counting  C , there  are  formally  m hierarchical  forms  C.. 

O 1 

The  algorithm  is  the  same  as  that  given  in  section  10  above,  except  the  index  i has  in  this 
case  been  started  at  zero  to  emphasize  the  special  role  of  C*  as  a zero-order  term,  being  the 
climatological  expectation  or  a priori,  unconditional  probab?lity  of  occurrence  of  each  of  the 
dummy  variables. 

The  multipliers  of  each  of  the  hierarchical  forms  are  scalars,  which  we  can  consider 
weighting  factors: 

w.  » X.  / y.x. 
i 1 ■'ll 

Therefore, 


w ? + w,  C,  ■►  W-  + . , 
o o 11  2 2 


w , ? , 
m-1  m-1 


where  w is  always  unity.  The  weighting  factors  w.  for  i > 0 become  small  with  increasing  forecast 
period  ^larger  n) . At  sufficiently  large  n,  therefore,  the  contribution  from  the  higher  order 
terms  i > 0 becomes  negligible,  and  b"  converges  to  climatology,  i.e.,  C . Mathematically, 


11m  b"  “ Itm 


m-1 


m 7 w. 


” Wo 


We  can  see  this  in  the  Kelly  AFB  cloud  ceiling  example  by  subjecting  the  4x4  probability 
generating  function  ^ for  that  problem  to  hierarchical  eigenvalue  analysis.  The  results,  in  terms 
of  hierarchical  matrices  and  weighting  factors,  are  shown  in  Table  8. 


Table  8 

EIGCNVALUE  ANALYSIS  OP  KEEP  COEPPICIENT  MATRIX  f 
POR  KELLY  APB,  TX,  CEILING  PROBLEM 


[l.OOOO 

0.0000 

0.0000 

O.OOOCT] 

1 

2 > 

0.3125 

0.0000 

0.0000 

0.0000 

Co  ■ 

0.2292 

0.0000 

0.0000 

0.0000 

.0.4583 

0.0000 

0.0000 

o.ooooj 

0.0000 

0.0000 

0.0000 

o.ooool 

0.8327" 

t,  - 

0.0000 

0.5292 

0.4797 

-0.6007 

1.8481 

0.0000 

0.3518 

0.3189 

-0.3993 

Lo.oooo 

-0.8810 

-0.7987 

1.0000, 

0.0000 

0.0000 

0.0000 

o.oooo" 

0.288S" 

0.0000 

0.6706 

-0.9562 

0.0209 

1.6715 

Cg  ■ 

0.0000 

-0.7012 

1.0000 

-0.0219 

Lo.oooo 

0.0307 

-0.0438 

0.0010 

As  tw  data  in  Table  6 or  the  first  row  of  the  Crout  auxiliary  in  the  preceding  section 
confirm,  C indeed  turns  out  to  be  the  matrix  of  climatology,  i.e.,  the  predictor  means:* 


*As  explained  above,  only  in  cases  like  the  Kelly  example,  where  each  Markov  state  corresponds 
to  exactly  one  binary  variable,  can  we  expect  to  find  exact  agreement  between  the  classical  Markov 
and  equivalent  Markov  models.  In  models  containing  more  than  one  category  of  variable  (e.g., 
visibility  as  well  as  celling),  the  equivalent  Markov  model  neglects  certain  Boolean  combinations 
of  predictors  that  would  form  Markov  states  in  the  classical  model.  Depending  on  the  additivity 
characteristics  of  the  variables  selected  for  inclusion  in  the  model,  this  neglect  of  Boolean 
combinations  has  a greater  or  lesser  effect  on  the  faithfulness  with  which  the  equivalent  Markov 
model  reproduces  the  behavior  of  the  classical  mode.  One  immediate  consequence  ^ the  lack  of 
exact  correspondence  between  the  equivalent  and  classical  techniques  is  seen  in  7 , which  will  not 
give  the  predictor  means  exactly  except  in  simple  ^ses  like  the  Kelly  example,  in  general,  the 
cll^tological  means  will  be  well  approxlMted  by  c , however.  In  fact,  any  significant  departure 
of  C from  climatology  is  evidence  of  non-additlvit^  and  gives  us  warning  to  examine  closely  the 
model's  performance  on  independent  data.  7-26 


1.0000 


0.3125 


0.2292 


0.4583 


o.to 


‘l.to 


2 t 


Although  the  matrix  ? is  4 x 4 in  size,  it  contains  one  dependent  row  (and  column). 
Accordingly,  the  number  of  non-zero  eigenvalues  is  4-1  » 3,  and  there  is  no  hierarchical  form  "J’j. 

How  long,  i.e.,  to  how  great  a value  of  n,  should  we  retain  terms  Wj^?^  and  W2727  If  we 
require  accuracy  to  the  third  decimal  place  in  B",  then  any  variation  In  the  third  decimal  place 
of  the  weighting  factors  wj  Is  necessarily  significant.  In  this  case,  such  a requirement  results 
In  keeping  the  first-order  term  Ci  until  n e^eeds  34,  and  the  second-order  term  ?2  until  n 
exceeds  five.  Thus,  in  obtaining  powers  of  B beyond  the  first  few,  substantial  computational 
savings  are  possible  by  abbreviating  the  matrix  polynomial  to  be  evaluated.  More  important,  we 
see  that  the  first-order  mean  departure  term  has  In  this  case  a noticeable  effect  for  at  least 
34  hours,  while  the  second-order  mean  departure  term  becomes  negligible  after  five  hours.  In 
other  words,  if  our  criterion  for  convergence  to  climatology  is  agreement  In  the  third  decimal 
place,  we  can  say  the  system  converges  to  climatology  In  roughly  36  hours.  This  can  be  thought 
of  as  the  "time  to  statlonarlty"  "t.  Only  during  this  time  can  the  Markov  model  forecast 
probabilities  other  than  the  long  term  climatological  expectation.  Beyond  X,  the  model  fore- 
casta only  climatology. 

There  likely  exist  better  criteria  than  agreement  to  an  arbitrary  number  of  decimal  places 
for  establishing  X and  for  deciding  when  terms  in  the  matrix  polynomial  become  negligible.  One 
such  criterion  would  be  to  retain  only  such  terms  w^?^  for  a particular  forecast  period  n ^t 
that  contribute  significantly  to  improving  the  Brier  P-score.  Thus  the  cutoff  for  each  term 
could  be  obtained  by  verifying  the  equivalent  Markov  model  on  independent  data  for  forecasts  of 
length  n At.  Trial  forecasts  could  be  made  and  verified  using  first-order  terms  only  (0,1), 
then  using  first-  and  second-order  (0,  1,2),  then  (0,  1,  2,  3),  etc.,  until  added  terms  no  longer 
significantly  Improve  the  P-score,  We  note  that  there  Is  no  need  to  verify  the  zero-order  term 
alone,  since  the  P-score  for  a forecast  of  pure  climatology  can  be  calculated  analytically 
(Brier  and  Allen,  1951): 


■ ‘ ■ P" 

where  is  the  mean  of  the  Jth  dummy  variable,  and  where  there  are  P such  dummies.  Obviously 
it  Is  not  possible  to  calculate  analytically  the  Brier  P-scores  for  the  classical  or  equivalent 
Harkov  models  themselves,  since  these  scores  depend  on  the  representativeness,  robustness  and 
Inherent  predictive  skill  of  the  particular  model  developed.  Good  models  will  produce  good  P- 
scores,  and  bad  models  will  do  badly.  Nor  even  Is  it  possible  to  prescribe  analytically  the 
relative  improvement  In  P-score  brought  about  by  retention  of  each  hierarchical  form.  If  the 
Improvement  could  be  calculated,  then  we  could  obtain  the  actual  P-scores  by  using  Pclimo 
the  bssellnc  P-score.  Since  we  have  already  shoini  logically  that  the  absolute  P-scores  cannot  be 
determined  analytically,  it  follows  that  the  relative  Improvement  In  P Is  likewise  not  obtainable 
by  analytical  means. 

Whatever  the  exact  method  of  obtaining  the  time  to  statlonarlty  Z',  we  can  always  regard 
V as  an  Indirect  measure  of  the  potential  skill  of  a Markov  or  equivalent  Markov  model.  In 
general  terms,  the  skill  of  a forecast  technique  Is  Its  ability  to  Improve  over  some  baseline 
objective  forecast  method  such  as  random  chance  (forecasting  the  weather  by  dice),  persistence 
(forecasting  the  weather  will  not  change),  or  climatology  (forecasting  the  mean).  Using  clima- 
tology as  our  no-sklll  baseline,  we  can  see  that  a Harkov  prognostic  scheme  that  converges  to 
climatology  In  a time  2*  < At  will  never  make  anything  other  than  a climatological  forecast.  Such 
a scheme  can  never  exhibit  skill  relative  to  climatology.  In  our  Kelly  example,  on  the  other  hand, 
the  probability  generating  function  B"  does  not  converge  to  climatology  to  the  third  decimal  place 
for  about  36  hours.  In  other  words,  the  model  has  36  hours  In  which  to  make  forecasts  that  differ 
somewhat  from  cllsMtology,  for  better  or  for  worse  depending  on  the  effectiveness  of  the  model. 

The  longer  the  time  to  statlonarlty  T,  the  greater  la  the  "skill  opportunity"  In  the  prognostic 
model.  This  Is  not  to  say,  of  course,  that  models  with  long  T will  necessarily  be  more  skillful 
than  those  with  short  T,  The  Kelly  equivalent  Markov  model  Is  a case  In  point.  Despite  Its 
relatively  long  T,  this  model  would  not  likely  show  ouch  skill  In  a test  using  Independent  data, 
since  It  wan  developed  from  an  Inadequate  dependent  data  set.  Nevertheless,  If  two  Markov  models 
Me  comparable  In  other  respects.  It  can  be  expected  that  the  one  with  longer  time  to  statlonarlty 
Twill  exhibit  greater  skill  than  the  other  relative  to  climatology. 
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A hierarchical  matrix  analysis  such  as  that  shown  in  Table  8 can  sometimes  be  used  to  gain 
added  physical  Insight  into  the  operation  of  the  prognostic  technique  in  terms  of  the  atmospheric 
variables  whose  behavior  is  being  modeled.  Consider,  for  example,  the  third  row  of  the  hierarchical 
matrix  ^ in  Table  8.  This  column  contributes  to  the  forecast  probabilities  only  if  the  observed 
ceiling  is  Z2*  l.e.,  3,000  - 15,000  ft.  In  cases  where  Z2  is  observed,  the  effect  of  element 

(3.3)  of^  Is  to  give  added  weight  to  persistence  of  Z2  In  the  short  range,  whereas  elenents 

(2.3)  of  ?2  “nd  (4.3)  of  ?i  shaken  the  probabilistic  persistence  forecast  by  decreasing  the  likeli- 
hood of  weather  change.  Since  Cl  Is  felt  consldjjrably  more  strongly  than  C2  because  of  Its 

larger  weighting  factor,  the  effect  of  (4,3)  of  dominates  (2,3)  of  and  the  second  most 
IDcely  short  range  event  (after  persistence,  the  most  IDiely)  is  deterioration  of  the  weather  from 
Zj  to  Zj^.  Note  that  the  heavy  short  range  weight  on  persistence  (3,3)  of  must  yield  after 
several  hours  to  a favored  deterioration  of  the  weather,  shown  by  the  balance  between  ^2,3)  and 

(4.3)  of  J . Only  the  greater  a priori  probability  of  good  weather,  shown  in  (4,1^  of  C , prevents 
from  malting  a "landslide  Prediction"  of  Z]^,  the  lowest  celling.  Nevertheless,  Is  Sufficiently 

strong  to  tilt  the  forecast  probabilities  In  the  direction  of  zj^  for  several  hours  (see  the  six 
hour  forecast  in  Table  7 and  the  displayed  forecast  probabilities  In  Figure  4) , 

This  sort  of  analysis  is  shorn  In  graphical  form  In  Figures  1-4,  which  display  the  forecast 
probabilities  as  a function  of  time  for  celling  categories  Zj^  Z2  and  Z3  based  on  an  Initial  con- 
dition of  *2-,^Moj5fover,_^n  Figures  1-3,  the  probability  contribution  due  to  the  hierarchical 
matrix  forms  Cq,  and  C2  are  also  shown  as  a function  of  time.  Clearly,  the  transient  effect 
of  persistence,  which  causes  to  be  the  most  probable  category  for  the  first  1 1/2  hours,  la 
mostly  due  to  tf^,  the  most  "short  lived"  hierarchical  form.  The  forecast  of  lowering  celling, 
which  holds  from  t*  1,5  to  t*8,5,  is  due  principal W to  with  the  decreasing 
probability  of  Z|  In  the  first  two  hours  being  due  to  C^,  The  convergence  of  the  forecast  pro- 
babilities toward  climatology  7^  operates  at  all  times  and  la  seen  to  dominate  beyond  3 hours. 

Thus  we  see  In  very  short  range  "probability  forces"  at  work  favoring  persistence.  In 
the  present  case,  these  forces  are  operating  on  a time  scale  of  several  hours.  In  7i  we  see 
longer  range  "forces"  at  work  over  time  scales  of  approximately  a half  day.  It  Is  from  this 
source  that  the  predicted  deterioration  of  the  weather  emerges.  Finally,  we  see  In  ?o  that 
climatology,  operating  most  strongly  beyond  12  hours,  exerts  an  Influence  toward  Improving  the 
celling. 


7.2> 


I 

I 


ntOBABILITY  <1  AT  KELLY  AFB 


t (hours) 


(hour*) 


14.  The  Ornsteln-Uhlcnbeck  Process:  Continuous  Variates. 


The  classical  Markov  and  ^itivalent  Markov  models  require  discretization 
of  predictors  and  predlctands  into  several  classes,  which  we  refer  to  as  dunnsy  or  binary  variables. 
Some  atmospheric  variables  (such  as  the  "present  weather"  elements,  tornado,  hall,  thunder,  etc.) 
are  quite  naturally  treated  in  terms  of  dummy  variables,  which  may  be  either  "on"  or  "off." 

Other  atmospheric  elements  such  as  temperature,  pressure  and  the  like  are  distributed  continuously, 
and  any  effort  to  categorize  them  Involves  some  loss  of  information.  The  classical  Markov  and 
equivalent  Markov  models  trade  decreased  resolution  in  some  of  the  variates  for  the  Important 
ability  to  Include  more  than  one  variable  in  the  prediction  scheme.  These  additional  predictors, 
if  selected  appropriately.  Impart  Increased  prognostic  skill  to  the  classical  and  equivalent 
Markov  models,  particularly  in  forecasting  the  weather  beyond  3-4  hours  in  the  future. 


For  some  purposes,  particularly  forecasting  very  short  range  weather  changes  on  the  order  of 
minutes  to  1 - 2 hours,  the  decreased  resolution  in  the  variate  to  be  forecast  might  not  always 
be  remunerated  by  the  increased  forecasting  skill  provided  by  use  of  additional  predictors.  This 
is  physically  reasonable.  We  would  naturally  expect  the  probability  density  function  of  the 
visibility  IS  minutes  from  now  to  be  influenced  more  by  the  present  visibility  than  by  some  other 
predictor  such  as  dewpoint  depression  or  even  wind  speed.  Under  these  circumstances,  the  prognos- 
tic scheme  used  to  make  the  15-mlnute  forecast  might  do  better  to  make  use  of  a thorough  observa- 
tion of  the  present  visibility  than  to  Incorporate  a host  of  additional  predictors.  In  a case 
such  as  this,  a Markov  process  involving  a continuous  variate  might  prove  more  skillful  than  either 
the  classical  or  equivalent  Markov  techniques. 


Crlngorten  (1966,  1968,  1971,  1972)  has  adapted  for  meteorological  use  a special  class  of  the 
Markov  chain  called  the  Omsteln-Uhlenbeck  process  in  which  a single,  continuous  variate  serves 
as  both  predictor  and  (through  a time  lag)  predlctand.  Herlng  and  Quick  (1974)  have  employed 
this  l^del  with  remarkable  success  in  forecasting  the  atmospheric  extinction  coefficient 

(m  ) for  IS,  30,  60  and  180  minutes  in  the  Air  Force  Geophysics  Laboratory’s  Mesonet  Experi- 
ment. The  success  of  this  single  station  forecasting  method  warrants  its  discussion  here. 


Assuming  a first-order  Markov  process,  in  which  the  outcome  of  trial  n depends  at  most  on  the 
outcome  of  trial  n-1,*  in  the  stationary  Omsteln-Uhlenbeck  process  the  normalized  variable  y 
(having  mean  of  zero  and  variance  of  1)  exhibits  serial  correlation  given  by 

e n At  - 

where  Pn  At  is  the  correlation  between  observations  separated  by  an  Interval  of  time  n At 
( A t is  the  unit  in  which  time  is  measured,  e.g.,  1 hour),  and  9 is  the  so  called  serial 
correlation  coefficient  or  autocorrelation  coefficient  between  observations  y (t)  and  y(t  + 
n At)  separated  by  time  Interval  n At,  which  is  any  multiple  of  the  basic  unit  of  time  A ^ 
characteristic  of  the  problem.  The  constant  a need  not  be  expressly  determined,  as  it  does  not 
appear  in  the  final  equation. 

Let  Pq  be  the  correlation  coefficient  between  observations  y(t)  and  y(t  + At)  separated 
by  one  unit  of  time  At.  Then, 


^ At  -aAt 

- * 


If  the  process  is  Markov,  then  the  correlation  coefficient  between  observations  y(t)  and  y(t  + 
n A t)  separated  by  an  arbitrary  interval  of  time  n At,  which  may  be  fractional,  is  given  by 

P - 

Under  the  Omsteln-Uhlenbeck  stochastic  process,  the  value  y^  of  the  continuous  variate  at 
time  t is  related  to  an  earlier  value  yo  ty  the  expression, 

Yt  - CVo  ♦ 

where  p is  a normalized  probability  and  where  P is  the  correlation  between  y^  and  yo  separated 
by  the  Interval  n At. 


*Hering  (1977)  reports  that  experimentation  with  higher  order  Markov  processes  involving  the 
outcoaw  of  n-2,  n-3,  etc,,  in  the  Omsteln-Uhlenbeck  context  did  not  produce  significant  improve- 
ment in  the  perforiwnce  of  the  forecast  scheme. 
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The  Ornstein-Uhlenbeck  model  presented  here  has  been  tested  and  given  extensive  use  in  the 
Air  Force  Geophysics  Labcratory's  Mesonet  Experiment,  where  it  has  been  used  both  as  guidance  for 
subjective  forecasters  and  as  a control  against  which  to  measure  the  skill  of  various  forecasting 
methods.  As  applied  in  the  Mesonet  Experiment,  the  model  forecasts  the  extinction  coefficient 
O (m”i)  in  the  form  of  the  normalized  variable  y,  where  at  initial  time 

Yo  • )c  In  ffo  ♦ f 

The  extinction  coefficient  at  initial  time  t^  is  and  k and  f are  coefficients  that  depend  on 

time  of  day  and  season.  Details  are  available  in  Tahnk  (1975).  The  unit  of  time  In  this 

application  is  1 hour,  so  time  is  given  as 

t * n At 

in  hours  (0.25  hours  to  3 hours).  Therefore,  the  Ornstein-Uhlenbeck  model  of  a first-order 
Markov  process  is 

yt  “ * yi  - p 

where  p is  a normalized  probability  that  a certain  threshold  y^,  will  be  exceeded  and  is  obtained 
idlrectly  from  the  climatological  record  as  explained  in  Grlngorten  (1972).  The  autocorrelation 
Po  Is  also  determined  from  data  but  may  be  "tuned"  to  improve  model  performance. 

The  model  equation,  as  written,  produces  a forecast  of  the  most  probable  normalized  extinction 
coefficient  yc  for  time  t based  on  exponential  decay  of  the  autocorrelation  coefficient  with  time. 
But  because  y is  a normalized  variable  with  mean  of  zero  and  variance  of  one,  tables  of  the  normal 
probability  Integral  can  be  consulted  or  the  nornuil  distribution  Integrated  numerically  to  obtain 
the  probability  of  exceeding  selected  operationally  significant  thresholds.  Once  the  "constants" 
have  been  determined,  this  model  requires  only  an  initial,  uncategorlzed  observation  of  the 
parameter  being  forecast  in  order  to  estimate  the  most  probable  future  value  y^  and  various 
"exceedance"  probabilities.  If  Is  observed  continuously,  the  computer  can  make  available 

continuous  forecasts  of  y^. 

Hering  and  Quick  (1974)  report,  based  on  the  first  year  of  the  model *s  use  in  the  Mesonet 
Experiment,  that  the  model  proved  difficult  for  conventional  forecasters  to  beat  in  IS-minute, 
30-minute,  1-hour  and  3-hour  forecasts,  especially  in  the  15-  and  30-minute  forecasting.  Fore- 
casters equipped  with  Mesonet  data  could  make  about  a 10  percent  improvement  in  Rank  Probability 
Score  over  the  Markov  model  in  15-  and  30-minute  forecasting  but  could  not  sustain  this  improve- 
ment at  1 and  3 hours.  In  3-hour  forecasting,  the  Markov  model  actually  beat  the  conventional 
forecasters,  who  were  denied  Mesonet  Information.  Percentage  improvement  in  Rank  Probability 
Score  above  the  score  produced  by  forecasts  of  pure  climatology  was  71,  60,  39  and  23  percent 
for  15-minute,  30-minute,  1-hour  and  3-hour  forecasts,  respectively.  The  Markov  probabilities 
were  much  less  biased  toward  pessimism  than  were  the  subjective  forecasts.  Because  the  M^irkov 
model  tends  to  dampen  a perturbation  with  time  as  the  model  gradually  "forgets"  Its  initial 
state,  the  Markov  probabilities  were  not  as  sharply  cast  as  were  the  subjective  probabilities. 

A model  such  as  this  is  limited  by  its  lack  of  "additional"  predictors,  l.e.,  elements  other 
than  that  being  forecast.  The  simple  exponential  decorrelation  that  forms  the  basis  of  the  model 
almost  never  produces  a forecast  of  worsening  weather.  Under  these  circumstances,  it  is  interest- 
ing to  speculate  how  much  improvement  in  forecast  performance  could  be  realized  by  Introducing 
models  containing  more  meteorology  than  the  Ornstein-Uhlenbeck  scheme  provides,  Tahnk  (1975)  re- 
ports the  results  of  a test  In  which  the  performance  of  two  regression-based  models  was  compared 
with  that  of  the  Ornstein-Uhlenbeck  model.  One  of  the  regression  models  was  a REEP  prediction 
scheme  not  much  different  from  the  REEP-based  equivalent  Markov  model  discussed  earlier  in  this 
chapter.  The  other  competing  model  was  based  on  classical  multiple  linear  regression  equations 
with  continuous  variables,  the  predictors  being  selected  by  a stepwise  scheme.  Neither  of  the 
competing  schemes  was  limited  to  single-station  predictors,  and  in  fact  both  regression  schemes 
heavily  chose  network  predictors*  In  preference  to  single  station  predictors  during  model 
development. 

In  tests  using  independent  data,  the  REEP-based  technique  proved  much  more  skillful  than  the 
Ornstein-Uhlenbeck  model  in  15-,  30-  and  60-mlnute  forecasting,  producing  Heldke  skill  scores 
of  6.8Z,  19. 2X  and  13.42  relative  to  Ornstein-Uhlenbeck  for  the  respective  forecast  periods 
indicated.  The  stepwise  regression  scheme  using  continuous  variables  did  about  as  well  as  the 
REEP  technique  at  15-mlnute  forecasting  but  showed  negative  skill  relative  to  the  Ornstein- 
Uhlenbeck  model  at  30  and  60  minutes. 


*See  section  16  of  this  chapter. 
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Stepwise  did  better  than  REE?  at  forecasting  the  onset  of  worsening  conditions  but  did  so  at 
y the  expense  of  a high  onset  false  alarm  rate.  Both  regression  models  did  better  than  the 

Omstein-Uhlenbeck  at  forecasting  onsets.  Not  unexpectedly,  all  three  models  found  forecasting 
improving  weather  a much  easier  cask  than  forecasting  deterioration.  Nevertheless,  the  stepwise 
regression  scheme  hardly  did  better  than  the  Ornstein-Uhlenbeck  model  at  this  Cask,  whereas  the 
KEEP  technique  picked  up  almost  twice  as  many  of  the  Improving  situations  as  did  either  of  the 
other  methods. 

Tahnk  (1975)  made  a special  study  of  the  usefulness  of  the  KEEP  technique  in  cases  of  radia- 
tion fog  only  and  found  that  in  this  more  difficult  situation  KEEP  was  less  skillful  than  the 
Ornstein-Uhlenbeck  model  at  15-mlnute  forecasting  (-4.1%  Heldke  skill  score).  At  30  and  60 
minutes,  however,  KEEP  beat  Ornstein-Uhlenbeck  by  a wide  margin  (25.6  and  38.9%,  respectively). 


In  a final  experiment,  Tahnk  (1975)  limited  the  REEP  technique  to  selection  of  single 
station  predictors  only,  i.e.,  Mesonet  observations  for  the  Hanscom  AFB  reservation  Itself. 

The  rederlved  REEP  equations  were  then  tested  relative  to  the  Ornstein-Uhlenbeck  model  using 
independent  data.  The  test  was  not  restricted  to  radiation  fog  cases.  The  restricted  REEP 
model  showed  large  negative  skill  relative  to  the  Ornstein-Uhlenbeck  technique  for  all  forecast 
periods.  It  thus  appears  important,  at  least  in  the  mesoscale  context,  to  allow  the  REEP  model 
to  select  network  predictors  if  these  are  available.  At  the  synoptic  scale,  use  of  network 
predictors  may  not  offer  correspondingly  large  improvements  in  forecasting  skill,  although  at 
least  some  gain  is  generally  achieved  by  including  network  predictors.  Chapter  10  of  this 
volume  presents  a synoptic  scale  forecasting  experiment  in  which  the  equivalent  Markov  or  REEP 
technique  is  limited  to  single  station  predictors  yet  shows  appreciable  forecasting  skill. 

15.  Ancillary  Models. 

a . General . 

More  often  than  not,  the  user  of  weather  information  needs  more  than  a simple  forecast, 

Pr(Zo.  *1.  *2  •••  *p)t 

of  the  probability  of  the  weather  at  some  future  time  t.  Reconnaissance  mission  planners,  tacti- 
cal commanders  and  even  construction  project  superintendants  are  likely  to  want  to  know,  for 
example,  "What's  the  probability  the  weather  will  be  good  24  hours  from  now  and  will  stay  good  for 
six  hours  after  that?"* 

In  fact,  a well  posed  question  such  as  this  defines  the  weather  el.'iment  of  the  customer's 
mission  success  indicator  (MSI),  provided  an  objective  meaning  can  be  found  for  the  customer's 
term,  "good  weather."  More  important,  the  manner  in  which  the  question  is  posed  gives  away  the 
secret  of  its  solution.  Let  us  write  the  question  graphically: 


What  is  the  probability  that 


f.  .the  weather  will  be 
|good  24  hours  from  now 


.] 


L.the  weather  will  stay 
jnd  I good  for  6 hours  after 
Cthat . . 


We  can  write  this  in  a slightly  more  mathematical  form,  using  Pr(M)  as  the  probability  that  the 
weather  will  be  good  enough  for  the  mission  to  proceed. 

Pr(M)  - q(W-G  at  tl  - to  + 24)  • p(W-G  at  ti  thru  t;  | W-G  at  tj) 

where  both  q and  p are  probabilities,  W represents  the  "weather,"  G stands  for  "good,"  and 


tj  + (n  - 1)  t 


for  “ 1 hour. 


We  say  that  the  customer's  question  gives  away  the  solution  to  his  problem  because  both  the 
question  (see  the  graphical  form)  and  its  mathematical  statement  are  in  two  parts.  We  need  both 
the  forecast  probability  that  the  weather  will  be  good  at  tj,  i.e., 

p(W”G  at  tj) 


*Thl8  is  simply  one  example  of  the  many  such  questions  that  can  be  posed  under  actual  opera- 
tional circumstances. 
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and  the  continuous  run  probability,  1 . c . , 


p(W»G  at  tj  thru  tj  \ W«G  at  t^) 

of  good  weather  for  six  more  hours. 

The  models  described  In  this  chapter  and  elsewhere  In  the  volume  address  themselves  for  the 
most  part  to  the  classical  forecasting  problem  of  obtaining  p(W-G  at  t^)  for  some  future  time  tj. 
The  second  part  of  problems  such  as  that  presented  above  Is  often  best  solved  by  methods  different 
from  but  related  to  those  used  for  the  first  part.  We  can  conveniently  consider  here  only  a 
few  of  the  methods  developed  in  Che  literature.  We  will  call  these  ancillary  models  because  they 
are  best  suited  to  the  "second  part"  of  weather  support  problems  such  as  that  described  above.  In 
keeping  with  the  subject  matter  of  this  chapter,  only  single  station  methods  of  the  Markov 
type  will  be  treated. 

b.  Continuous  Run  Probabilities  by  Markov  Process:  Hot  and  Cool  Spells  or  Wet  and  Dry  Periods. 

When  we  are  Interested  In  particular  sequences  of  Markov  states,  such  as  the  probability  of 
a cool  day  (C)  following  four  consecutive  hot  days  (H) , l.e.,  Pr(Hj.  H2,  H3,  C^),  we  need  not 
construct  the  whole  Markov  transition  matrix  because  we  are  Interested  only  in  certain  elements 
of  the  matrix.  Sakamoto  (1970)  found  that  under  a first  order  Markov  process,  the  probability 
of  a particular  sequence  of  Markov  states  aj  is  given  by 

Pr(ai,  #2,  83  ...  a„)  - 

q(»l)  P(«2l*l)  P(«3l»2)  •••  P(»ml»m-l) 


where  the  subscripts  refer  to  the  sequence  number  of  the  Markov  trial  and  where  q represents  the 
Initial  probability.  In  the  particular  case  of  three  consecutive  hot  days  followed  by  a cold 
day,  the  algorithm  Is 

Pr(Hi.  H2,  H3,  C4)  - q(Hi)  p(H2|Hi)  p(H3|H2)  p(C4|H3) 

Obviously,  the  conditional  probabilities  p(aj  | aj)  are  first  order  Markov  transition  probabili- 
ties, given  In  this  case  by 


Final  State 


C 

Initial 

State 

H 


for  the  two-state  hot/cold  system. 

In  a first  order  Markov  model,  the  outcome  of  trial  n Is  presumed  to  depend  only  on  the  out- 
come of  trial  n - 1 Imnedlately  preceding.  In  a second  order  Markov  model,  trial  n <8  affected 
not  only  by  n - 1 but  also  by  n - 2.  Sakamoto  (1970)  shows  that  a second  order  Markov  chain  for 
the  same  example  would  be  written  as 

Pr(Hi,  H2,  H3,  C^)  - q(Hi)  p(H2|H^)  p(H3|H2,  Hj)  p(cJH3,  H2) 

where  the  conditional  probabilities  can  Involve  as  much  as  two  days  prior  to  the  day  of  intereat. 
A third  order  chain  would  be 

Pr(Hi,  H2,  H3,  C4)  - q(Hi)  p(H2|Hi)  p(H3|H2,  Hi)  p(C4|H3,  H2,  Hi) 

Involving  the  conditional  probability  that  day  m will  be  cold  given  that  days  m - 1,  m - 2 and 
a - 3 were  hot. 


C S. 


Pcc  " 

Pch  ■ 

p(C|C) 

p(H|C) 

Phc  “ 

Phh.“ 

P(C|H) 

p(HlH) 
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Sakamoto  found  that  the  manner  In  which  the  Markov  states,  "cold  day"  and  "hot  day,"  were 
defined  affects  the  performance  of  the  resulting  Markov  chain  in  modeling  the  continuous  run 
probabilities  (e.g.,  , H2 , H3,C^) ) . In  fact,  choice  of  a threshold  used  to  define  a hot  or 

cold  day  proved  to  be  a factor  in  determining  the  suitability  of  a particular  order  of  the  Markov 
chain  model.  Inappropriate  thresholds  may  force  the  use  of  a second  order  chain,  an  eventuality 
that  might  be  avoided  by  use  of  thresholds  better  suited  to  the  prediction  problem  at  hand. 

Use  of  arbitrary  temperatures  (e.g.,  86°F,  89°F,  ...)  to  define  hot  and  cold  days  proved 
unsuitable  in  the  context  of  either  a first  order  or  a second  order  Markov  chain.  More  effective 
were  definitions  based  on  a certain  number  of  degrees  above  a moving  weekly  mean  maximum  (e.g., 
3°F,  6°F,  9°F,  ...  above  the  moving  mean).  Even  with  relative  thresholds  such  as  this,  it  was 
shown  that  the  value  of  the  relative  threshold  (e.g.,  6°  or  9°)  dramatically  affects  the  perform- 
ance of  the  Markov  model  and  the  suitability  of  a particular  order  of  the  chain. 

By  resorting  to  relative  criteria  for  "hot"  and  "cold"  days,  Sakamoto  found  he  was  able  to 
use  a first  order  Markov  chain  to  approximate  the  probability  of  hot  and  cold  spells  of  various 
lengths.  Use  of  second  order  Markov  models  produced  a slight  Improvement  in  the  resulting  proba- 
bilities, but  the  small  gain  in  precision  was  not  sufficient  to  Justify  the  added  computational 
expense  in  developing  and  applying  the  more  complex  models. 

Similar  work  in  applying  first  order  Markov  chains  to  cool  and  hot  spells  has  been  reported 
by  Caskey  (1964)  and  Spiegel  (1966),  who  indicated  the  first  order  model  applied  well  enough. 
Caskey  (1964)  applied  the  model  of  Gabriel  and  Neumann  (1962),  discussed  below,  to  the  problem  of 
calculating  the  probability  of  cold  spells  in  London.  The  model  equation  is 

Pr(C,n)  - |jl  - p(C|C)3  p(C|C)"'"°  n > no 

where  Pr(C,n)  represents  the  probability  of  n days  of  cold  (C)  weather,  l.e.,  a cold  spell  of 
n days'  duration.  The  conditional  probabilities  p(C  | C)  represent  the  likelihood  that  a day  will 
be  cold  given  that  the  previous  day  was  cold,  a constant  independent  of  n.  Note  that  the  expon- 
ent is  not  the  number  of  days  of  the  cold  spell  n but  rather  the  number  of  days  by  which  the  cold 
spell  exceeds  the  baseline  length  given  by  Caskey  as  four.  This  is  because  the  probability 
that  a cold  spell  of  length  no  will  continue  k further  days  is  p(C|C)*‘,  where 

k • n - no 

Other  authors  have  considered  whether  sequences  of  wet  and  dry  days  can  adequately  be 
represented  by  first  order  Markov  chains.  Caskey  (1963),  Feyerherm  and  Bark  (1965)  and  Gabriel 
and  Neumann  (1962)  report  that  the  first  order  Markov  process  is  generally  adequate  for  this 
purpose,  although  in  some  cases  it  has  not  been  completely  satisfactory.  Spiegel  (1966)  and 
Feyerherm  and  Bark  (1965)  tested  higher  order  Markov  models.  While  these  models  did  show  slight 
improvement  over  the  first  order  Markov  chain,  the  improvement  did  not  seem  to  justify  the 
greater  computational  expense  involved  in  development  and  application  of  the  more  complex  models. 

c.  Probability  of  Precipitation  or  No  Precipitation  in  an  Interval  of  Time. 

If  we  consider  chat  there  are  two  kinds  of  days,  dry  (D)  and  wet  (W),  then  the  probability 
Pr(W,n)  that  precipitation  will  occur  at  some  time  during  an  interval  of  n days  can  be  given  in 
terms  of  the  probability  q(W,n-l)  of  precipitation  in  n-1  days  and  the  conditional  probability 
p(W,n  I D,  n-1)  of  a wet  nth  day  following  a period  of  n-1  dry  days. 

As  is  the  case  in  many  combinatorial  problems,  this  problem  is  best  solved  by  considering  the 
complementary  probabilities,  l.e.,  the  probability  of  dry  weather.  The  probability  Pr(D,n)  of 
n consecutive  dry  days  is  simply  one  minus  the  probability  Pr(W,n)  of  precipitation  occurrence  on 
any  day  among  the  n day^.  Mathematically, 

Pr(D,n)  • 1 - Pr(W,n) 

Likewise,  the  conditional  probability  of  a dry  day  following  a period  of  n-1  dry  days  is  simply 
one  minus  the  conditional  probability  p(W,n  | D,  n-1)  of  a wet  day  following  n-1  dry  days,  l.e., 

p(D,n  I D,n-1)  - 1 - p(W,n  | D,  n-1) 

and  Che  marginal  probability  q(D,n-l)  of  n-1  dry  days  is  one  minus  the  marginal  probability 
q(U,n-l)  of  n-1  wet  days: 


q (D,n-1)  - 1 - q(W,n-l) 


Thus  the  probability  of  n consecutive  dry  days  is 


• q(D,n-l)  p(D,n  I D,n-1) 
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Pr(D,n) 


In  terms  of  complementary  probabilities,  this  is 


1 - Pr(W,n)  - [l  - q(W,n-l)]  • [1  - p(W,n  | D,n-1)] 

Expanding  and  rearranging  the  equation  yields  the  recursive  form, 

Pr(W,n)  • q(W,n-l)  • [l  - p(W,n  | D,  n-1)]  + p(W,n  | D,n-1) 

If  the  process  is  first  order  Markov,  the  conditional  probability  p(W,n  | D,n-1)  that  the 
nth  day  will  be  wet  following  n-1  dry  days  is  influenced  solely  by  whether  the  day  immediately 
preceding  day  n is  dry  or  wet.  For  consistency  with  our  earlier  notation,  let  us  say  that  for 
the  first  order  Markov  process, 

p(W,nlD,n-l)  - p(W|D) 

Under  these  circunstances,  the  model  equation  becomes 

Pr(W,n)  - q(W,n-l)  • [ 1 - p(W|  D)  ] + p(W  | D) 

Caskey  (1963)  shows  that  this  equation  reduces  to 


Pr{W,n)  - 1 - [1  -q(W)]*[l  - p(W  | D)] 

where  q(W)  is  the  marginal  probability  of  any  day  being  wet. 

If  this  model  equation  is  to  be  applied  to  real  problems,  then  values  of  q(W)  and  p(w|D)  must 
be  obtained  from  the  historical  data.  Two  choices  are  open.  Either  we  can  estimate  both  q(W) 
and  p(w|d)  directly  from  the  data;  or  we  can  obtain  p(w|D)  and  p(w|w)  from  the  data  and  then 
obtain  q(W)  from  the  identity  of  Gabriel  and  Neumann  (1962): 

q(W)  - p(W  I D)  . L 1 - P(W  I W)  + p(w|d)J"^ 

Consistent  with  our  notation,  p(V  | l/>  Is  the  conditional  probability  that  a day  will  be  wet  given 
that  the  previous  day  was  wet. 

In  applying  this  model  to  actual  rainfall  data,  Caskey  (1963)  found  that  sequences  of  wet 
and  dry  days  can  be  represented  adequately  by  the  first  order  Markov  chain. 

A first  order  Markov  model  suitable  for  estimating  the  probability  of  wet  and  dry  spells  of 
arbitrary  length,  the  probability  of  exactly  s wet  days  among  n days  following  a wet  or  dry  day, 
the  probability  of  s wet  days  among  any  n days,  and  the  probability  of  a weather  cycle  of  n days 
has  been  formulated  by  Gabriel  and  Neumann  (1957,  1962)  and  Gabriel  (1959),  who  found  the  model 
fits  Tel  Aviv  dally  rainfall  data  quite  well. 

In  developing  the  model,  Gabriel  and  Nuemann  (1957)  reasoned  that  the  lengths  of  wet  and 
dry  spells  conform  to  a so  called  "geometric"  distribution  (Feller,  1950,  p 217)  to  a good 
degree  of  approximation.  Under  the  geometric  distribution,  the  probability  that  a positive, 
integral-valued  random  variable  X equals  a particular  integer  k is  given  by 

Pr(X-k)  • fl-p] 

where  k > 1,  2,  3,  ...  and  p is  a conditional  probability.  We  can  particularize  this  to  the 

problem  of  wet  and  dry  spells.  We  reason  that  p(Wt  W)  is  the  probability  of  a wet  day  followed 
by  another  wet  day,  l.e.,  the  probability  of  two  wet  days  in  sequence.  Likewise,  the  probability 
of  three  wet  days  in  sequence  is 


p(W  I W)  p(W  I W) 

and  of  four  wet  days  Is 

p(w|w)  p(w|w)  p(w|w) 

Generalizing,  it  is  apparent  that  the  probability  of  k wet  days  in  a row  la 

p(W|W)'‘'^ 

which  can  be  interpreted  as  the  probability  that  a wet  day  will  be  followed  by  k-1  wet  days. 
The  probability  p(D  | W)  that  a wet  day  will  be  followed  by  a dry  day  is  the  complement  of 
P(W  I W),  so 

p(d1w)  - 1 - p(w|w) 
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We  may  think  of  a wet  spell  as  a sequence  of  k wet  days  terminated  by  at  least  one  dry  day.  Thus 
the  probability  of  a wet  spell  of  exactly  k days  is  the  probability  of  k wet  days  followed  by 
a dry  day,  or 

Pr(W,k)  - p(D  I W)  . p(W  I 
or 

Pr(W,k)  - [l  - p(W|W)]  . p(W  I 

which  Is  Identical  with  Feller*s  geometrical  distribution,  given  above.  The  same  reasoning 
can  be  used  to  obtain  the  probability  of  a dry  spell  of  length  k: 

Pr(D,k)  - [l  - p(D  1 D)]  . p(D  I 0)*^”^ 

Obviously,  we  are  working  with  a Markov  transition  matrix: 

Current  Day 


Previous 

Day 


Drv  (D) 

Wet  (W) 

Dry 

PDD  ” “(DlD) 

PDw  • p(wId) 

Wet 

Pud  " '>(D|w) 

PUU  “ n(w|w) 

IHI 

so  that 

p(D|D)  + p(W|D)  - 1 

p(D|W)  + p(W  I W)  - 1 

In  such  a model,  the  probability  of  rainfall  i days  after  a wet  day  is 

Pr(W,W+l)  - q(W)  + [l  - q(W)]-tp(W|  W)  - p(W  1 D)  ] ‘ 

where  q(W)  is  the  marginal  probability  of  a wet  day,  given  by 

q(W)  - p(W|D) 


q(W)  - p(W|D) 

1 - p(W  I W)  + p(W  I D) 

The  probability  of  rainfall  1 days  after  a dry  day  is,  by  analogy, 

Pr(W,I>fl)  - q(W)  + q(W)*tp(W|W)  - p(WlD)]‘ 

It  is  apparent  that  both  Pr(W,W+i)  and  Pr(W,IHl),  the  probabilities  of  rainfall  1 days  after  a 
wet  or  dry  day,  respectively,  both  converge  to  q(W),  the  marginal  probability  of  a wet  day,  with 
increasing  1.  This  is  the  Markov  process  gradually  "forgetting"  its  initial  state.  The  marginal 
probability  may  be  estimated  from 


q(w)  - p(W  I D)  . [ 1 - p(W  |W)  + p(w|d)]*^ 


as  shown  above* 


Defining  a weather  cycle  as  a combination  of  a wet  spell  and  an  adjacent  dry  spell,  Gabriel 
and  Neumann  (1962)  found  the  probability  of  a weather  cycle  of  length  n days  is 

f fl  - p(W  I D)  1 - p(W  I W)"-! 

Pr(C,n)  - p(W  I D)  . [ 1 - p(W  I W)J<— i ■ 

V 1 - p(W  I D)  - p(W  I W)  ^ 

The  probability  of  exactly  s wet  days  among  n days  following  a wet  day  is 

Pr(sWlH-Hl)  - p(W|W)“  [i  - p(W|D)]"'*  • 

t [f^]‘ 


where 


i 


n + % - |2e-n-»-^|  for  •<n 
0 for  e>a  (and  aua  then  Involves  only  thla  tera) 


) 


and  where  a and  b are  the  least  integers  not  snaller  than  (c-1)  and(l/^>  respectively. 
Similarly,  the  probability  of  exactly  s wet  days  among  n days  following  a dry  day  is 


Pr(aW|lHti)  - p(W|W)*  Q - p<W|D)y*  • 

fo  r.-lirn-sin  - pjwiw)]*  rptwion** 

b-OL*  lb  - p(wi»)J  |p(»nw)j 


where 


-( 


n 4-  ^ • |2s-n-^  I for  a>0  \ 

0 for  s<K)  (and  sum  then  Involvea  only  this  term)} 


The  probability  of  s wet  among  any  n days  is 

Pr(sU|n)  - q(W)  Pr(sW|W+n)  + Q - q(W)]  Pr(sW|D4ti) 

Gabriel  and  Neumann  (1962)  suggest  that  the  close  fit  of  their  simple,  first  order  Markov 
model  to  dally  rainfall  in  Tel  Aviv  implies  the  distribution  of  wet  and  dry  spells  is  not 
"periodic"  or  "harmonic"  to  any  appreciable  extent  in  the  location  and  at  the  scale  considered. 
Furthermore,  in  order  to  fit  the  Markov  sudel's  requirement  for  the  Independence  of  day  n rainfall 
from  that  on  any  day  earlier  than  n-1,  the  dally  rainfall  — at  least  at  Tel  Aviv  — must  be 
among  those  weather  elements  that  "decouple"  or  decorrelate  rapidly.  Presumably,  certain  other 
weather  elements  might  not  decouple  as  fast  as  the  dally  rainfall  and  might  therefore  not  be 
modeled  as  well  by  a first  order  Markov  process.  Other  weather  elements  might  decouple  faster, 
perhaps  allowing  use  of  a model  in  which  the  weather  on  day  n la  assumed  to  be  entirely  independent 
of  weather  on  all  previous  days.  The  Bernoulli  process  (Bernoulli  trials),  whose  outcomes  conform 
to  the  binomial  distribution,  might  then  be  a nx>re  appropriate  model  than  the  Markov  chain. 

16.  Development  of  the  Regression  Equations;  Generalised  vs.  Localixed  Operators. 

We  have  seen  that  an  equivalent  Markov  prognostic  model  can  be  developed  by  using  a multiple 
linear  regrasslon  scheme  to  produce  a probability  generating  function  T in  matrix  form.  Screen- 
ing regression  would  logically  be  used  to  select  the  predictors  Included  in  the  multiple  linear 
regression  relationships  whose  coefficients  define  f. 


Left  unresolved  in  this  analysis  has  been  the  fundamental  question  of  whst  shall  be  Included 
in  the  data  set  to  be  fed  to  screening  regression  to  establish  the  multiple  linear  regression 
relationships.  Classically,  the  approach  has  been  to  use  only  predlctand  data  from  the  individual 
station  for  which  forecasts  are  desired.  The  resulting  multiple  linear  regression  prediction 
equations  (and  probability  generating  function  T)  then  apply  only  at  the  station  in  question. 

There  might  in  addition  be  an  effort  to  stratify  the  data  by  season,  thus  obtaining  several 
sets  of  statistical  prediction  equations,  aach  peculiar  to  a particular  station  and  season. 

Such  sets  of  aquations  are  Said  to  be  localixed  operators  in  that  their  applicability  is  local 
in  spaca,  tlma,  or  both.  By  far  the  greater  number  of  local  forecast  studies  have  been  sude  in 
this  way  (George,  1960),  presumably  bacause  their  authors  felt  better  skill  could  be  obtained 
by  particularising  the  predlctands  to  one  time  and  place. 


This  concapt  has  been  aucceaafully  challenged  by  Harris,  Bryan  and  MacMonegle  (1963),  who 
advancad  the  concapt  of  a generallsad  statistical  oparator  as  a viable  alternative  to  the  localised 
operator.  A generalised  operator  is  a statistically  derived  specification  or  prediction  equation 
that  h—  general  applicability  throughout  different  timas  of  year  and  at  different  gaotraphical 
locetloea  (Harris,  at  el,  1963). 


J 

1 


Under  the  generalized  operator  concept,  data  are  not  stratified  by  predictand  station  and 
season  but  rather  are  grouped  by  continent-size  region  or  possibly  into  a single,  worldwide  sample. 

Under  those  circumstances,  the  prediction  equations  derived  apply  at  any  location  in  the  region  or 
world,  not  just  at  one  point.  It  is  well  to  point  out  that  although  the  statistical  prediction 
equations  thus  developed  are  generalized  and  apply  over  a large  geographical  region,  the  predictor 
values  have  to  be  computed  separately  for  each  point  at  which  the  equations  are  to  be  solved. 

Otherwise,  of  course,  the  scheme  would  predict  homogeneous  weather  over  the  entire  region. 

The  justification  for  the  generalized  operator  concept  is  ultimately  that  the  atmosphere  obeys 
the  same  physical  laws  over  Tennessee  that  it  obeys  over  Texas,  Tahiti  or  Tibet.  There  is  no 
seasonal  or  geographical  variability  in  the  universal  physical  laws  governing  the  motion,  composi- 
tion and  state  of  the  atmospheric  fluid.  Therefore,  if  statistical  relationships  can  somehow  be  | 

made  to  capture  the  essential  physics  of  the  atmosphere  rather  than  be  dominated  by  temporal  and 
spatial  variation  in  the  absolute  value  of  some  of  the  atmosphere's  variables,  then  the  prognostic 
applicability  of  those  relationships  will  not  be  limited  in  space  and  time. 

Indeed,  generalized  operators  are  not  new.  Dynamical  weather  prediction  models  are  among  the 
best  examples  of  generalized  deterministic  operators.  In  these  models,  known  physical  laws  are 
expressed  mathematically  in  the  form  of  a somewhat  simplified  model  of  the  atmosphere.  Analogous 
to  these  generalized  hydrodynamical  equations  in  the  deterministic  domain  are  the  generalized 
statistical  operators  in  the  domain  of  uncertainty.  Limitations  in  numerical  weather  prediction 
models  and  error  in  meteorological  observations  now  prevent  dynamical,  deterministic  prediction  of 
surface  weather  elements  such  as  celling  and  visibility.  Under  these  circumstances,  statistical 
operators  in  a mixed  dynamical-statistical  prediction  scheme  can  help  bridge  the  gap  between  model 
capabilities  and  user  expectations,  producing  probabilistic  forecasts  of  surface  weather  elements 
based  on  dynamical  predictions  of  upper  air  patterns. 

From  a practical  point  of  vl^w,  there  exist  several  reasons  for  preferring  generalized 
statistical  operators  over  their  localized  equivalents: 

- The  meteorologist  is  frequently  required  to  forecast  the  weather  at  a location 
having  either  no  historical  weather  data  at  all  or  having  a period  of  record  In- 
adequate for  preparation  of  classical,  localized  operator  forecast  studies.  Part 
of  the  Job  of  the  military  meteorologist  is  to  prepare  himself  to  forecast  for 
Bare  Bases,  remote  sites  and  forward  areas  that  he  never  heard  of  before  being 
asked  to  forecast  for  them. 

- Where  there  is  a sudden  demand  for  forecasts  for  many  locations — as  there 

would  be  in  wartime — the  derivation  of  a host  of  localized  statistical  operators  may 
require  more  time  than  is  available  or  Involve  too  great  a cost. 

For  these  reasons,  it  Is  well  to  proceed  as  far  as  possible  with  the  generalized  operator 
concept.  To  do  so,  means  must  be  found  by  which  to  subtract  climatic  and  seasonal  variability 
from  the  atmospheric  variables  entering  the  prediction  scheme  as  predictors  and  predlctands.  It 
is  not  unreasonable  to  exprct  some  success  in  this  search  to  remove  geography  and  season.  Our 
science  shows  us  that  whereas  the  absolute  values  of  many  atmospheric  variables  may  be  Influenced 
by  seasonal  climatic  regime  and  location,  it  is  often  the  relative  measures  such  as  changes  and  > 

gradients  in  parameters  like  pressure  or  moisture  that  dictate  the  production  of  weather.  Accord- 
ingly, it  ought  to  be  possible  to  treat  or  somehow  transform  these  atmospheric  variables  such 
that  their  seasonal  and  geographical  variability  is  removed.  Simple  methods  of  doing  this  are  to 
use  sea  level  pressure  instead  of  station  pressure,  to  give  the  temperature  in  terias  of  its  devia- 
tion from  the  normal,  or  to  use  wind  anomaly  in  lieu  of  raw  wind  data.  We  shall  see  below  that 
better  means  are  available  for  doing  this  sort  of  thing.  For  now,  it  suffices  to  know  that  we 
will  have  removed  as  much  as  possible  of  the  geographical  and  seasonal  variability  of  the  parameters 
to  be  used  as  predictors  and  predlctands  in  a generalized  statlatlcal  operator. 

How  does  the  generalized  operator  work?  As  shown  in  Figure  S,  one  starts  by  removing  the 
geographlcsl  and  seasonal  variability  from  the  parameters  Xj  to  be  used  as  predictors.  Sometimes  > 

In  addition  one  normalizes  the  variables  if  one's  prediction  scheme  requires  normally  distributed 
Input  data.  The  treated  predictors  are  shown  as  Xj  in  Figure  i.  Next  the  treated  predictors  are 
fed  to  the  generalized  statistical  operator,  which  might  be  an  equivalent  Markov  regression  scheme 
such  as  Che  one  discussed  above,  or  perhaps  a discriminant  function.  The  output  of  the  generalized  ^ 

operator  is  a probabilistic  forecast  P(Yj)  for  each  Yj.  A reverse  transformation  is  used  to  obtain  i 

P(Yi)  for  Y,  In  the  society's  conventional  units  of  measurement.  The  statistical  operator  is  i ^ 

general  in  that  it  is  used  for  all  locations  and  times,  whereas  the  transformations  and  Inverse  I 

transformations  are  local,  being  peculiar  to  a cIm  and  place.  I 


1 

i'J 

2 ’ 

3 ...  X 

1 •••  * 

P-1  ’ 

P 

Remove  Geographical  6 Seasonal  Variability 

(Normal  Transforaiatlon) 

1 

•i  1 

rn 

2 > 

3 5 

1 

1 ' 

1 

P-1  ’ 

P 

Generalized  Statistical  Ooerator 

' 

'(Yi)  1 

(Y2)  1 

(Y'3)  1 

(7j)  1 

(Yp.i)  1 

•(Yp) 

Restore  Geographical  & Seasonal  Variability 

>(Yi) 

^Yj) 

^Yj) 

1 

’(Yj)  1 

’(Yp-l)  1 

>(Yp) 

Figure  5.  Use  of  a ftenerallzed  statistical  onerator. 


Generalized  statistical  operators  such  as  this  are  of  two  types,  the  single  station  operator 
and  the  network  operator.  In  short  period  prediction,  persistence  dominates  and  the  observation 
at  the  station  itself  is  found  to  provide  a large  share  of  the  information  necessary  for  successful 
prediction.  Only  the  single  station  operator  will  be  discussed  to  any  great  extent  in  this  chapter. 

Harris,  Bryan  and  MacMonegle  (1963)  developed  generalized  statistical  operators  of  both  the 
single  station  and  the  network  type  for  terminal  forecasting  at  arbitrary  locations  over  extensive 
geographical  areas.  The  single  station  operators  could  use  as  predictors  only  data  from  the  predlc- 
tand  location  Itself.  Predlctands  Included  celling,  visibility  and  total  cloud  amount  In  2,  3, 

6,  12,  18  and  24-hour  forecasts.  Predictors  emphasized  the  primary  Sutcliffe  development  equation 
terms,  vortlclty  advection  and  thermal  (thickness)  advectlon.  Developed  from  dependent  data,  the 
generalized  operators  were  tested  against  a withheld,  or  Independent  data  sample. 

To  remove  seasonal  and  geographical  variability  from  parameters  used  as  predictors  and 
predlctands,  Harris,  et  al,  used  transformed  variables  called  standardized  anomalies.  These  are 
essentially  the  anomaly  A(X^j)  of  the  1th  observation  of  the  Jth  variable  Xj,  where 


A(Xij) 


(Xij  - X.,) 
N 


ilx 

N Pi 


IJ 


expressed  In  units  of  the  variability  of  the  variable.  Specifically, 
S(Xij)  of  the  1th  observation  of  the  Jth  variable  Is  given  by 

Xt1  - X.) 


S(Xtj) 


ff(X,) 


where  <X(X<)  Is  the  standard  deviation  of  the  jth  variable.  Both  7, 
time  and  plAce  where  Xj  !•  observed. 


the  standardized  anomaly 


and  9(Xj)  are  local  to  the 


Why  divide  by  the  standard  deviation?  This  Is  done  because  the  natural  variability  of  the 
weather  differs  froa  one  place  to  another.  Probability  theory  and  conmion  sense  tell  us  that  "an 
anoBsly  Is  aiich  isore  anosMlous"  If  It  represents  a 20  deviation  from  the  mean  than  If  It  constitutes 
a aere  lO  deviation.  Therefore,  the  standardized  anomaly,  which  represents  the  mean  departure  In 
teras  of  the  number  of  standard  deviations  by  which  It  departs  from  the  mean.  Is  better  related 
to  the  probability  density  of  the  variable  Xj  than  the  raw  anomaly  A (X^j)  would  be.  Furthermore, 
the  standardized  anoaaly  Is  more  free  of  seasonal  and  geographical  variability  than  Is  the  anomaly 
Itself. 

With  normalized  predictor  and  predlctand  data  expressed  In  terms  of  standardized  anomalies, 
Harris,  et  al,  (1963)  then  used  the  method  of  screening  regression  to  develop  predlctor/predlctand 
relationships. 

To  test  whether  Che  generalized  operator  concept  could  compete  effectively  with  Che  more 
traditional  method  of  developing  single-station  relationships,  Harris,  et  al,  developed  two  sets 
of  predlctor/predlctand  relations,  each  constructed  from  a different  dependent  data  set: 

- The  generalized  operator,  single-station  relations  were  developed  by 
combining  normalized  standard  anomaly  data  from  several  nearby  stations, 
with  the  data  from  one  other  station  being  withheld.  For  example,  one  data 
group  Included  Syracuse,  NY;  Brattleboro,  VT;  Hanscom  Field,  MA,  and  Providence, 

RI,  with  Westover  AFB,  MA,  being  withheld. 

- The  localized  operator,  single-station  relations  were  developed  using  only 
data  from  the  station  withheld  from  the  generalized  operator  data  set,  l.e., 

Westover  AFB. 

The  predictors  and  predlctands  selected  from  the  generalized  operator  forecasting  relations  were 
also  used  In  the  localized  operator  equations. 

With  two  secs  of  single  station  prediction  equations  developed — one  generalized  operator  set 
and  one  localized  operator  set — both  sets  were  used  to  make  forecasts  on  Independent  data  for  the 
withheld  station  ( l.e. .Westover  AFB).  A simple  persistence  forecast  was  also  run  as  a baseline 
for  evaluation  of  forecasting  skill.  Two  scores  were  used  to  compare  performance  of  the  competing 
forecast  schemes:  percentage  of  hits  and  the  Heldke  skill  score.  Preflgurance  and  postagreement 
were  also  computed. 

In  celling  forecasts,  both  techniques  beat  persistence  beyond  about  6-8  hours,  and  the 
generalized  single  station  operator  performed  about  as  well  as  the  localized  single  station  equation. 
The  same  picture  held  for  visibility,  except  the  statistical  forecast  techniques  beat  persistence 
at  all  forecast  periods  beyond  2 hours.  In  forecasting  the  total  cloud  amount,  both  single  station 
techniques  lost  to  persistence  for  sll  forecast  periods,  although  the  generalized  operator  method 
again  did  about  as  well  (or  as  poorly)  as  the  localized  operator  technique. 

Harris,  et  al.  felt  that  the  power  of  their  prediction  scheme  was  not  seriously  limited  by 
Its  use  of  a multiple  linear  regression  equation  to  generate  the  forecast  probabilities.  Although 
the  regression  equation  Itself  Is  assuredly  linear,  the  predlctands  need  not  be.  In  fact,  Harris, 
et  al.  (1963,  1965)  used  as  one  of  their  predictors  the  highly  non-linear  vortlclty  advectlon: 


Although  prognostic  techniques  based  on  a network  approach  are  strictly  beyond  the  scope  of 
a chapter  on  single  station  forecasting.  It  la  well  to  point  out  that  Harris,  et  al.  (1963)  tested 
generalized  network  operators  as  well  as  generalized  single  station  operators.  The  network  opera- 
tors, except  for  forecasts  less  than  about  3 hours,  improved  upon  persistence  in  celling,  visibility 
and  total  cloud  amount  forecasting.  The  network  techniques  were  In  general  less  skillful  than 
single  station  methods  for  short  range  forecasts  (less  than  about  3-6  hours),  while  at  the  longer 
forecast  periods,  the  networks'  ability  to  diagnose  the  advectlon  of  meteorological  fields  gave 
them  the  edge  over  single  station  techniques.  Interestingly,  although  the  single  station  approach 
had  significantly  fewer  "hits"  than  persistence  at  all  forecast  periods  In  predicting  total  cloud 
amount,  the  network  techniques  had  more  hits  than  persistence  for  all  such  cloud  amount  forecasts 
except  the  2-hour  one.  It  appears  cloud  amount  forecasting  benefits  more  from  having  available 
a spatial  field  of  Information  than  does  either  celling  or  visibility  forecasting. 

In  1965,  Harris,  Bryan  and  MacMonagle  extended  their  1963  work  using  generalized  operators. 
Thsy  retained  their  basic  predictive  scheme  In  which  the  probability  of  surface  treather  elements 
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was  forecast  by  generalized  statistical  operators  using  as  predictors  only  parameters  derivable 
from  500,  700  and  1000  mb  analyses.  The  earlier  equations  (Harris,  et  al,  1963)  emphasized 
the  primary  Sutcliffe  terms,  vortlclty  advection  and  thermal  (thickness)  advection.  The  later  work 
(Harris,  et  al.  1965)  Included  as  predictors  such  permanent  effects  as  orography  and  coastal  In- 
fluence, the  former  providing  a means  of  Including  forced  vertical  motion  and  the  latter  providing 
for  moisture  sources.  Also  Included  as  predictors  were  tlme-of-day  parameterlzatlons  of  the  Inci- 
dent solar  radiation.  In  contrast  to  today's  model  output  statistics  (MOS),  Harris,  et  al, 
employed  a "perfect  prog"  concept,  where  predictors  were  taken  from  analyses  rather  than  prognoses. 
In  the  1965  work,  five  predictands  were  used:  celling,  visibility,  total  cloud  amount.  Integrated 
operating  condition  (celllng/vislblllty  categories),  and  occurrence  of  precipitation.  The  general- 
ized operators  for  each  predlctand  were  developed  based  on  a sample  of  16  stations  spread  evenly 
throughout  the  United  States  east  of  100°W.  Another  sample  of  15  stations  for  the  same  area  was 
withheld  from  development  and  used  as  Independent  data  In  later  tests  of  the  method.  Altogether, 
data  from  31  stations  were  used  either  as  a development  sample  or  an  independent  test  sample. 

Three  sets  of  prediction  equations  were  prepared.  The  first  set  was  permitted  to  use  only  the 
purely  meteorological  parameters  used  In  the  1963  work,  such  as  vortlclty  advection  and  thickness 
advection.  The  second  set  of  equations  was  permitted  to  use  these  plus  the  orographic  and  coastal 
effects  terms.  In  deriving  the  final  set  of  equations,  the  screening  regression  scheme  was  also 
permitted  to  select  radiation  (tlme-of-day)  parameters.  Having  three  sets  of  prediction  equations 
allowed  the  authors  to  examine  the  incremental  prognostic  value  provided  by  the  additive  terms. 

It  was  found  that  Che  orographic  and  coastal  effects  predictors  added  significant  Information  to 
the  estimation  of  all  predictands  In  one  or  more  seasons  of  the  year.  Forecast  results  along  the 
West  Coast  and  along  the  eastern  slopes  of  the  Rockies  were  particularly  Improved  by  the  orographic 
and  coastal  effects  terms.  The  tlme-of-day  or  radiation  parameters  added  even  more  significant 
information  when  permitted  to  enter  the  schemes  for  forecasting  all  elements  except  precipitation. 

In  fact,  tlme-of-day  was  selected  more  frequently  than  any  other  predictor  In  equations  forecasting 
the  visibility. 

The  prediction  schemes  of  Harris,  et  al,  (1965)  improved  upon  "pure  climatology"  forecasts 
for  all  predictands  and  all  seasons,  with  the  greatest  Improvement  occurring  during  fall  and  winter. 
The  relations  derived  held  up  when  applied  to  Independent  data.  The  prognostic  schemes  proved 
weakest  In  mountainous  regions.  The  method  using  generalized  operators  was  able  to  relate  changes 
In  upper  air  patterns  used  as  predictors  to  changes  in  surface  weather  (predictands).  The  pre- 
dlctand fields  could  be  analyzed  much  as  weather  maps  themselves  are.  Features  showed  continuity 
in  space  and  time  and  retained  a good  correspondence  with  upper  air  patterns.  The  statistical 
methods  were  able  to  produce  realistically  tight  gradients  separating  large  regions  of  fair  skies 
from  small  areas  of  adverse  weather. 

In  applying  the  method  of  generalized  operators,  there  are  three  problems  to  be  overcome. 

First,  predictors  must  be  devised  that  adequately  characterize  the  physical  processes  produc- 
ing the  weather  elements  being  forecast.  Parameters  such  as  thickness  advection,  vortlclty  advec- 
tion and  pressure  change  are  good  candidates.  The  parameters  devised  as  predictors  must  be  feasible 
for  computation  on  an  operational  basis. 

Second,  climatological  biasing  of  the  values  of  the  predictor  and  predlctand  parameters 
selected  must  somehow  be  removed  or  the  resulting  prediction  equations  will  be  unduly  specific  to 
one  place  and  time. 

Third,  physical  characteristics  of  a particular  locality  may  generate  strong  predictor/ 
predlctand  relationships  that  are  not  generally  valid  over  the  region  as  a whole.  These  local 
effects  must  not  be  allowed  to  contaminate  the  generalized  operator.  It  Is  generally  possible  to 
devise  and  add  to  the  list  of  candidate  predictors  various  parameterlzatlons  of  local  effects 
(e.g.,  station  elevation,  onshore  component  of  the  wind).  Then,  the  generalized  statistical  rela- 
tionship Is  developed  from  a sample  of  data  Including  many  stations.  Under  these  circumstances, 
there  should  only  be  minimal  contamination  of  the  generalized  operator  by  station-peculiar  re- 
lationships. 


Single  Station  Forecasting  Models  In  an  Operational  Setting 


Single  station  forecasting  techniques  such  as  those  presented  In  this  chapter  are  particularly 
feasible  for  operational  application  because  of  their  simplicity.  Once  such  a model  Is  developed, 
the  user  can  obtain  a probabilistic  forecast  of  all  future  states  of  the  weather  at  his  location 
simply  by  entering  the  conditions  he  observes  at  the  time  the  forecast  Is  made.  No  observations 
from  outlying  stations  are  needed.  If  the  forecasting  model  Is  developed  as  a generalized  opera- 


tor,  such  that  it  has  wide  applicability  over  a large  region,  then  the  model  constitutes  a powerful, 
mobile,  stand-alone  weather  support  capability,  accompanying  the  forecaster  on  a tactical  operation, 
juaq>lng  with  him  Into  an  assault  zone,  nr  moving  with  him  from  place  to  place  as  the  command  element 
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he  supports  Is  relocated.  Provided  forecasts  are  desired  only  for  the  forecaster's  immediate 
vicinity,  the  models  presented  in  this  chapter  have  no  need  of  weather  communications.  Nor  do 
these  models  depend  on  uncertain  assistance  from  centralized  weather  support  facilities  that  may 
themselves  be  under  attack  or  may  be  otherwise  occupied  with  higher  priority  tasks. 

A tactical  scenario  for  single  station  Markov  models  is  not  difficult  to  imagine.  Under 
extremely  primitive  circumstances,  the  forecaster  could  be  equipped  simply  with  a small  book 
containing  the  several  hierarchical  matrix  forms  Cj  needed  to  compute  1"  equivalent  Markov 
tBodel.  The  actual  computations  could  be  done  by  hand,  requiring  at  most  5-10  minutes  to  compute 
the  probability  of  each  weather  element  whose  forecast  is  desired.  Under  more  permissive  circum- 
stances, the  forecasts  could  be  prepared  automatically  by  a microprocessor  or  a programmable  hand 
calculator.  The  equivalent  Markov  model  for  forecasting  the  celling  at  Kelly  AFB,  for  example, 
fits  easily  Into  today's  rugged  Hewlett-Packard  67  programmable  calculator.  In  developed  weather 
support  facilities  such  as  the  Tactical  Weather  System  (TWS)  or  Automated  Weather  Dissemination 
System  (AWDS),  today's  small,  highly  reliable  mlrlcomputers  would  not  be  challenged  at  all  by  the 
simple  computational  tasks  Involved  in  making  single  station  forecasts  by  the  methods  we  have 
described. 

Indeed,  models  such  as  this  can  form  the  basis  of  automated  meteorological  watch  and  short 
range  forecasting  to  be  performed  at  the  highly  automated  weather  stations  of  the  future.  At 
centralized  weather  support  facilities  such  as  the  Air  Force  Global  Weather  Central  (AFGWC),  these 
models  can  be  expected  to  bridge  the  gap  between  Initial  time  and  the  period  12-24  hours  later 
when  numerical  weather  prediction  models  finally  stabilize  and  begin  to  provide  useful  forecast 
guidance  through  model  output  statistics  (MOS).  With  both  AFGWC  and  the  weather  stations  using  Che 
same  prognostic  models  for  at  least  the  first  12  hours  or  so,  compatibility  of  forecasts  among 
the  various  facilities  is  insured. 

18.  Summary  and  Conclusions. 

We  have  seen  that  simple,  yet  powerful  prediction  methods  of  the  Markov  type  can  successfully 
be  applied  to  the  problem  of  forecasting  the  weather  at  a station,  given  an  initial  observation 
of  the  weather  at  that  station  only.  This  Is  single  station  forecasting  by  means  of  statistical 
prognostic  models. 

An  equivalent  Markov  model  due  to  Miller  (1968)  has  been  treated  in  detail  and  found  to  be 
comparable  to  the  classical  Markov  model  but  much  easier  to  develop  and  apply  to  practical  fore- 
casting problems.  The  same  model  is  applied  to  a large  scale  prediction  problem  in  chapter  10 
of  this  volume.  It  is  shown  that  the  equivalent  Markov  model  can  make  use  of  predictors  produced 
by  the  model  output  statistics  (MOS)  method  or  on  the  other  hand  can  be  applied  in  the  tactical 
context,  where  key  data  are  often  denied  and  single  station  methods  are  needed.  In  the  context 
of  weather  support  at  developed  facilities  such  as  the  minicomputer-equipped  weather  stations  of 
the  future,  the  equivalent  Markov  model  is  proposed  as  a means  of  generating  probabilistic  fore- 
casts that  bridge  the  gap  between  the  current  observation  and  the  first  useful  numerical  prognoses. 
• Use  of  models  such  as  this  both  at  weather  centrals  and  in  the  field  would  Insure  compatibility 

of  forecasts. 

Decision  assistance  to  specialized  users  such  as  mission  planners  and  command  and  control 
? Is  shown  to  benefit  from  the  use  of  so  called  ancillary  models,  which  make  use  of  the  probabilis- 

^ tic  predictions  generated  by  the  equivalent  Markov  technique. 

f 

fit  is  shown  that  by  using  appropriate  generalized  operators  it  becomes  possible  to  devise 
a very  small  number  of  single  station  forecasting  methods  that,  taken  together,  are  capable  of 
^ generating  single  station  (or  network)  forecasts  for  any  location  In  the  world.  Including  those  for 

1 which  local  climatology  is  not  yet  available.  Thus  a small  repertoire  of  equivalent  Markov  schemes 

and  an  appropriate  supporting  system  of  ancillary  models  can  adequately  prepare  a weather  service 
to  meet  a wide  range  of  peacetime  and  wartime  weather  support  requirements,  some  of  which  may  be 
difficult  to  anticipate  and  tedious  to  prepare  for  by  other  means. 
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Introduction 


In  meteorology  the  ability  to  predict  the  weather  at  some  time  later  utilizing  the 
weather  parameters  that  are  measured  at  the  present  time  is  a goal  that  has  been  long 
sought  by  meteorologists.  One  problem  is  that  there  is  uncertainty  involved  as  to 
the  relationships  between  the  present  and  future  weather  parameters.  Therefore,  the 
use  of  probabilities  facilitates  the  expression  of  these  uncertainties.  The  various 
statistical  methods  of  estimating  prediction  probabilities  all  have  their  weaknesses. 
One  weakness  that  is  common  to  several  methods  is  the  inability  to  account  for  non- 
linear relationships  of  a Joint  relationship  type.  This  chapter  addresses  that  problem. 

The  definition  of  the  prediction  problem  in  terms  of  the  present  weather  para- 
meters (predictors)  and  the  weather  parameters  that  we  wish  to  predict  at  some  time 
in  the  future  (predictands ) is  as  follows: 

Predictors:  X - X,,  X,,  ...,  X„,  ...,  X (1) 

- 1 z p p 

Predictands:  Y - Y,  , Y.,,...,  Y ,...,  Y_  (2) 

- 1 2 g G 

where  X and  Y are  vector  qpantities  and  the  X,,  X^,  etc.  represent  present  temperature, 
clouds,  winds,  etc.  and  Yj^,  Y2,  etc.  represent  temperature,  clouds,  winds,  etc.  at  a 
later  time. 

Using  statistical  methods,  the  objective  is  to  predict  f(Y|x)>  the  conditional 
given  X.  For  example,  using  linear  regression,  if  one  wanted  to 
predict  tommorrow's  mean  temperature  T^,  given  today's  mean  temperature  Tq,  f(Tj^,| 

Tg)  would  be  depicted  as  in  Figure  1. 
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If  the  conditional  distribution  is  known  the  solution  is  somewhat  easier;  if  the 
distribution  is  not  known  then  one  can  perform  a piecewise  estimation  of  f (y|  X)  as 
depicted  in  Figure  2. 

"^ome  of  the  statistical  techniques  that  are  available  to  predict  f(Y|x)  are  as 
follows  (Miller,  1969): 

a.  Contingency  methods:  These  methods  are  weak  because  cell  frequencies 

become  small  or  nonexistent  when  the  number  of  predictors  increases  or  the  sample  is 
sma 1 1 . 

b.  Stepwise  regression  (Miller,  1967)  works  well  when  the  predictors  and 
predictands  are  continuous  and  f(Y|x)is  Gaussian;  this  method  is  parametric. 


c.  Nonparametric  Discriminant  Analysis  (Miller,  1962):  Miller  (1969)  men- 
tions that  this  method  is  very  powerful,  but  computationally  burdensome  and  there- 
fore infeasible.  Nonparametric  refers  to  the  fact  that  the  distribution  is  unknown. 
(Siegel,  1956).  Additivity  is  assumed  to  hold. 


d.  Regression  Estimation  of  Event  Probabilities  (REEP),  (Miller,  1964)  is  a 
stepwise  regression  procedure  where  the  predictors  and  predictands  are  zero-one 
variables.  It  is  powerful  like  Nonparametric  Discriminant  Analysis,  but  the  calcul- 
ations are  not  as  burdensome  and  therefore  it  is  not  infeasible.  It  is  also  non- 
parametric- Additivity  is  assumed  in  REEP. 


Figure  1.  _Conditional  Distri- 
bution of  T,  (tomorrow's  mean 
Temperature)  Given  T (today's 
mean  Temperature).  or 


f<’l',|To) 


K(T,) 


f(Y  IX) 


?(Y|X) 
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Figure  2.  Piecewise  Estimation 
of  Conditional  Distribution 
(Miller,  1969). 


Two  binary  operations  and  an  operation  of  complementation  can  be  defined  using 
the  Venn  diagram.  The  "cap"  operation, q,  equates  to  the  intersection  of  the  two 
subclasses  or  predictors  and  is  also  the  Boolean  logical  and  operation,  symbolized  by 

The  "cup"  operation,  U,  or  the  union,  equates  to  everything  within  the  two 
sub-classes  and  is  the  Boolean  logical  or  operation  which  is  symbolized  by  0.  The 
complementation  operation  is  symbolized  by  a and  means  everything  within  the 

population  outside  of  the  subclass.  Figure  4 depicts  with  a Venn  diagram  these 
operations  utilizing  predictors  and  X2 

2^ 

A set  of  N predictors  has  2 possible  Booiv.3n  functions.  For  the  two  predictors 
Xj  and  X2  , therefore,  16  possible  Boolean  functions  exist.  Table  2 depicts  the  16 
possible  Boolean  functions,  their  arithmetic  equivalent,  and  their  corresponding 
Venn  diagrams  (Miller,  1969). 

The  logical  or  and  logical  and  Boolean  operators  can  be  defined  as  in  Tables  3 
and  4 respectively.  The  13th  function  in  Table  2 is  usually  defined  as  the  exclusive 
or  operator  (Table  5). 


Figure  4.  Cap,  cup,  and  complementation  operations  (Flegg,  1964) 

When  N is  large,  any  tests  of  the  significance  of  each  Boolean  function  with 
regard  to  a particular  predictand  would  be  prohibitive  in  time  and  effort.  However, 
out  of  16  possible  Boolean  predictors,  only  seven  depict  situations  that  add  informa- 
tion. To  illustrate  this  fact,  assume  that  two  predictors,  X and  defined 

as  f ol lows : 

Xj  - cloud  ceiling  <200  feet 
X2  - Visibility  <1/2  mile 

If  we  were  concerned  with  predicting  the  probability  that  an  airfield  would  be  below 
landing  minlmums  at  some  later  time,  these  two  conditions  would  very  possibly  be 
predictors . 

Using  the  Boolean  functions  irom  Table  2,  the  physical  explanations  would  be  as 
given  in  Table  6.  As  can  be  seen  from  the  table,  only  three  cf  the  Boolean  predictors 
would  not  be  redundant  and  would  add|  any  information  beyond  that  provided  by  the  REEP 
formulation.  They  are  X.,  X-,  and  X ®X  The  remaining  thirteen  are  derivable  as  linear  functions 
of  these  three. 

Miller  (1969)  used  the  fact  that  there  is  a partial  ordering  amongst  the  2 possible  Boolean 
functions  to  develop  an  algorithmic  method  of  uncovering  the  joint  predictive  information  that  one 
is  looking  for  and  further  reduce  the  testing  for  significance  of  the  Boolean  functions.  This  partial 
ordering  Is  the  lattice  property. 

Figure  5 depicts  a non-Boolean  lattice  of  the  devisors  of  12  and  depicts  the  idea  of  partial 
ordering.  All  numbers  connected  by  lines  moving  upward  in  the  figure  are  devisors  of  the  higher 
numbers.  In  a Boolean  lattice  the  devisor  property  is  not  applicable,  but  a subset  property  can  be 
depicted  in  a similar  manner  as  in  Figure  6.  Each  node  on  this  lattice  of  the  predictors  X.  and  X^ 
represents  a subset  of  the  nodes  connected  to  it  upwards  in  the  lattice.  For  example,  referring  to 
the  numbers  of  the  Boolean  functions  in  Table  2 which  are  shown  on  Figure  6,  Function  11  is  a subset 
of  Functions  2,  13,  and  5.  These  functions  are  then  subsets  of  Functions  3 and  8,  3 and  9,  and  1 

8 and  9 respectively.  Figure  7 depicts  the  same  Boolean  lattice  in  zero-one  notation. 

The  basic  principle  underlying  the  SLAM  is  that  if  a Boolean  node  in  the  lattice  is  significantly 
related  to  a particular  predictand,  one  must  decide  whether  to  proceed  up  or  down  in  the  lattice  in 
order  to  obtain  more  information  or  more  refined  information.  If  the  Boolean  function  is  significant- 
ly related  to  the  predictand,  it  may  be  reflecting  more  important  sources  of  information  based  upon  I 

its  position  in  the  lattice  (Miller,  1969).  i 


I 
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2 . Nonlinearity 


One  weakness  is  common  to  all  of  the  statistical  methods  discussed  earlier  except 
the  contingen,-y  methods;  nonlinear  information  contained  in  the  joint  relationships 
among  the  predictors  must  be  known  a priori.  In  linear  relationships  the  effects  of 
the  predictors  are  additive,  while  in  nonlinear  relationships  the  effects  are  not 
additive  (Siegel,  1956).  Tukey  (1949)  discusses  nonadditivity  and  suggests  a test 
for  it. 

An  example  of  nonadditivity  best  illustrates  this  property.  Suppose  that  one  has 
the  following  equation: 

Prob  (Y-1)  » .2  + .2Xj  + .2X2  (3) 

where  Xj^  and  X2  are  zero-one  variables.  Their  relationship  to  the  probability  that 
Y«1  is  shown  in  Table  1.  The  effects  of  the  two  predictors  in  this  table  may  have 
been  determined  by  using  contingency  methods.  If  neither  Xj^  nor  X2  are  "on”  (equal 
one),  i.e.,  neither  occurred  and  are  zero,  the  probability  that  Y-l  is  .2.  Like- 
wise, if  either  Xj  or  X2is  "on"  the  probability  is  .4  (from  Equation  3).  However, 
if  both  X,  and  X2  are  "on"  the  probability  that  Y«1  is  .9  while  Equation  3 would  give 
a probability  of  .6.  This  relationship  is  nonadditive  or  nonlinear  because  the 
effects  of  the  predictors  are  nonadditive.  Had  Equation  3 been  of  the  form 

Prob  (Y=l)  = .2  + .2Xj^+  .2X2  + .3Xj^X2  (4) 

then  the  effects  of  the  predictors  would  have  been  additive. 

Table  1.  Relationships  of  Predictors  Xj^  and  X2  to  Prob  (Y»l). 


Predictor  X2 

0 1 

Predictor  Xj^ 

0 

.2  .4 

1 

.4  .9 

3 . Screening  Lattice  Algorithm  (SLAM ) 

Miller  (1969)  developed  an  algorithm  to  overcome  this  weakness  in  REEP . This 
algorithm,  the  screening  lattice  algorithm  (SLAM),  can  be  used  to  find  joint  pre- 
dictive information  among  the  predictors. 

The  SLAM  utilizes  Boolean  algebra  and  lattices  to  find  the  joint  relationships. 

In  order  to  use  Boolean  algebra,  all  of  the  predictors  must  be  considered  as  zero-one 
variables.  Since  the  REEP  predictors  are  zero-one  variables,  this  algorithm  can 
easily  be  utilized  in  conjunction  with  REEP. 

Flegg  (1964)  offers  an  explanation  of  the  Boolean  algebra  and  Boolean  functions 
that  the  SLAM  employs.  Utilizing  Boolean  algebra  one  can  study  the  significance  of 
relations  between  classes  (Flegg,  1964). 

If  one  has  two  predictors,  Xi  and  X2,  which  are  elements  of  the  total  population  of  predictors, 
they  and  their  relationships  can  be  depicted  using  a Venn  diagram  and  they  can  be  considered  as  sub- 
classes of  the  total  population.  This  pair  of  predictors  is  shown  in  Figure  3. 


Figure  3.  Venn  Diagram 


w 


I 


Table  3. 

e 

Xj*0 

Xi*! 


Logical  or  Operator  (Miller,  1969). 


X3.O 

0 

1 


1 

1 


Table  4.  Logical  and  Operator  (Miller,  1969). 


X2.O 

0 

0 


V* 

0 

1 


Xi-O 

Xi-l 

Table  5.  Exclusive  or  Operator  (Miller,  1969). 
Exclusive 
or 

0 


X2-O 


Xi=o 

Xi^l 


1 


X2-1 

1 

0 


le  6.  Boolean 

Predictors  for  Ceiling  <200  feet 

(Xj)  and  Visibility  <1/2 

mile  (X^) . 

Boolean 

Function 

Physical 

Explanation 

Remarks 

1. 

Ceiling  <200  ft 

Visibility  <1/2  mi 

3. 

Ceiling  <200  ft 
or  Visibility  <1/2  mi 

4. 

Ceiling  <200  ft 

and  Visibility  <1/2  mi 

5. 

Ceiling  >200  ft 

Redundant 

with 

No.  1. 

(Use  1 or  5) 

6. 

Visibility  >1/2  mi 

Redundant 

with 

No.  2. 

(Use  2 or  6) 

7. 

Ceiling  <200  ft 
or  Visibility  >1/2  mi 

8. 

Ceiling  ^200  ft 
or  Visibility  <1/2  mi 

9. 

Ceiling  ^200  ft 
or  Visibility  ^/2  mi 

Redundant 

with 

No.  4. 

(Use  4 or  9) 

10. 

Ceiling  <200  ft 

and  Visibility  >1/2  mi 

Redundant 

with 

No.  8. 

(Use  8 or  10) 

11. 

Ceiling  >200  ft 

and  Visibility  <1/2  mi 

Redundant 

with 

No.  7. 

(Use  7 or  11) 

12. 

Ceiling  >200  ft 

and  Visibility  >1/2  mi 

Redundant 

with 

No.  3. 

(Use  3 or  12) 

13. 

(Ceiling  > 200  ft 
or  Visibility  ^1/2  mi) 
and  (Celling  <200  ft  or 
visibility  <1/2  mi) 

14. 

(Ceiling  >200  ft  or 
Visibility  <1/2  mi)  ^ 
(Ceiling  <200  ft  or 
Visibility  >1/2  mi ) 

Redundant 

with 

No.  13 

(Use  13  or  14) 

IS. 


16. 
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Gives  no  meaningful  information. 
Gives  no  meaningful  information. 


Figure  6. 


The  Boolean  Lattice  of  Two  Variables  (Miller 


J 


i 


I 


I 

i 


The  resulting  algorithm  (SLAM)  is  as  follows  (Miller,  1969): 

a.  Run  the  straig)vt  REEF  on  the  data  (x  on  ) . 

b.  Select  the  X's  which  are  significantly  related  (singly)  to  . 

C.  Select  significant  exclusive  or s among  the  unselected  pairs  of  X's  from 
step  b. 

d.  Generate  all  of  the  Boolean  ands , ors , and  note  between  the  pairs  selected 
in  steps  b and  c. 

e.  Select  those  generated  in  step  d that  add  significant  information  over  the 
REEF  variables  of  step  a and  the  raw  pair  ma)clng  up  the  generated  variable. 

f.  Continue  generating  and  selecting  until  convergence. 

g.  Run  a final  REEF  on  a set  of  predictors  consisting  of  all  variables  selected 
as  significant  in  any  of  the  above  steps. 

According  to  Millar  (1964),  the  following  procedure  can  be  used  to  select  the 
predictors  in  the  REEF  procedure:  From  the  set  of  the  F individual  predictors, 
select  the  first  predictor  Xi,  such  that  it  contributes  most  significantly  (better 
than  chance)  with  regard  to  one  of  the  G predlctands  (see  Equations  1 and  2); 
the  predictor  X^,  has  a computed  F distribution  statistic  that  is  larger  than  any  one 
of  G computed  statistics  for  the  remaining  F-1  predictors. 

In  the  same  manner  the  other  predictors  Xj,  ...,  X^  predictors  can  be 

selected  when  they  are  considered  in  conjunction  with  the  preceding  predictors.  The 
selection  process  continues  until  . is  not  satisfactorily  significant.  At  each 

of  the  r selection  stages  the  G possible  computed  F statistics  are  compared  with  a 
critical  value  of  F^ . The  size  “ is  given  by  the  equation  (Miller,  1964) . 


1 

20  F - (S  - 1) 


(5) 


where  S is  the  number  of  predictors  that  have  already  been  selected. 

Lllcewlsa,  a selection  procedure  for  selecting  the  Boolean  predictors  can  use  the 
F test  where  the  size  is  given  by  Equation  5. 


4.  Applications  of  SLAM 

Simulated  experiment:  To  test  the  algorithm,  a set  of  data  was  generated  which 
contained  a preassigned  relationship.  The  objective  was  to  determine  how  closely 
the  original  relationship  could  be  reconstructed. 


Specifically, 
consisting  of  ten 

one  hundred  data  vectors  were  generated 
thousand  elements,  as  shown  in  Table  1. 

using 

random  numbers 

Observation 

Variables 

*1 

*2 

m — sf 

...  *100 

y 

1 

*1,1 

*1,2 

•••  ’'1,100 

"1 

2 

*2,1 

*2.2 

•••  *2,100 

*2 

3 

*3,1 

*3,2 

*3 , 100 

"3 

10000 

*10000,1 

X 

10000, 

^ •••  *10000, 

100 

Y 

10000 

Table  1 
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The  individual  observations  are  either  zero  or  one.  A one  was  assigned  to  an 
observation  if  a selected  random  number  r (0  < r < 1.00)  was  exceeded  by  the  percent- 
age desired  for  that  variable.  The  100  perceTit^es  were  extracted  from  a random 
number  table.  Once  the  data  matrix  of  Table  1 was  constructed  the  following  four 
Boolean  variables  were  derived: 

- (X^  « Xj)  « itj 
Zj  - (x^  « x^) 

Zj  = (Xj  • X^)  « (Xj^  « Xj) 


Further,  a linear  function  was  constructed  which  has  all  the  features  of  a REEF 
equation,  i.e.,  estimates  the  probability  that  the  predictand  Y's  observation  value 
is  a one.  For  this  simulated  experiment  the  "REEF”  equation  is 


Logical  Form: 


Frob  (Y-1 ) 
or 

Arithmetical 

Prob(Y-l) 


.399  + .239Zi  + .137Z2  + .257Z3  - .384Z4 


Form : 


« 1 

0 1 

.399  - .144X 


2 - •239X^X2 


- .239X,X,  + .137X,X. 

13  14 


.257XjXj-  .239X2X3+ 

A9  A30 


. 257X_X  .+ 
2 4 


. 239X3^X3X3 


. 257X, X,X . + 
12  4 


From  this  equation  each  of  the  10,000  observations  of  the  Y vector  were  deter- 
mined. Namely,  a random  number  between  zero  and  one  was  drawn  and  if  its  value  was 
exceeded  by  the  equation  estimate  a one  was  Inserted  in  the  Y vector  for  that  obser- 
vation otherwise  a zero  was  inserted.  The  data  were  then  subjected  to  a computer 
program  which  carried  out  the  steps  in  the  algorithm. 

The  results  are  shown  below  in  arithmetical  form. 

Constructed  Reconstructed 


*0 

(Additive  Constant) 

. 399 

.399 

*1 

- . 144 

-.149 

*2 

(X^Xj) 

-.239 

- .245 

*3 

(X3X3) 

- . 239 

-.245 

*4 

(X, X . ) 

1 4 

.137 

.123 

*5 

(XiX^) 

-.257 

- . 248 

*6 

(XjXj) 

-.239 

-.245 

*7 

(X2X4> 

.257 

.277 

*8 

(X^XjXj) 

.239 

. 245 

N 

(Xj^XjX^) 

-.257 

. 000 

*10 

(X^XjX^Xj) 

.257 

.000 

It  should  be  pointed  out  that  the  last  two  terms  are  not  as  different  as  might 
bs  intarpratsd  at  first  glance.  The  simultaneous  event  X3X2X4 occurred  27  times  in 
10,000  while  the  event  X3X2X4XJ  occurred  15  times.  Thus,  the  lack  of  fit  affects 
only  12  observations  out  of  10,000  since  the  coefficients  on  the  last  two  terms 
nullify  each  other.  This  lack  of  fit  may  be  attributed  to  the  fact  that  statistical 
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significance  was  required  of  all  Boolean  functions  selected.  For  the  sample  size 
used  this  small  contribution  failed  to  exceed  the  chosen  significance  level 
(«  - . 05)  . * 

It  was  concluded  that  reconstruction  was  performed  to  a satisfactory  degree. 

Further  experiments:  Three  data  bases  were  used  to  test  SLAM.  Each  of  these 
consisted  of  real  world  observations.  The  first  of  these,  the  Travelers  automobile 
insurance  data  and  the  problem  requiring  solution,  motivated  the  development  of 
SLAM  under  the  Travelers  Environment  and  Man  contract. 


(1)  Data  Base:  The  Travelers  Insurance  Companies  real-time  data  files. 

Given:  The  predictors  were  the  items  recorded  on  the  applications  of  8,000 

insureds,  such  as,  sex,  age,  state  of  residence,  use  class,  past 
driving  record,  occupation,  etc.  The  predictand  selected  for  study 
was  the  event  that  one  or  more  claims  would  be  submitted  by  the  in- 
sured over  the  following  twelve  month  period. 


Objective:  Predict  the  probability  that  the  insured  would  submit  one  or  more 

claims  over  the  next  twelve  month  period. 

Solution:  The  equation  determined  from  SLAM  and  the  associated  selected  pre- 

dictors were  : 


Travelers  Insurance  Companies  Data 


Coe  f ficients 


.073 
. 145 
. 137 
. 064 

. 068 
. 056 
. 044 

- . 044 

- .033 

- .033 

. 024 

. 024 
.023 
. 015 

- .015 


Independent  Data  Results: 

Base 


Information**  .2871 
P score  * * .1497 
Hits  7348 


Selected  Predictors 

(Additive  Constant) 

NE  Resident  and  Use  Class  87 
Vermont  Resident 

Age  68-72  and  Small  Land  Area  State  of 
Residence 

NE  Resident  and  Use  Class  37 
Use  Class  97 

Washington,  D.C.  Resident 
Use  Class  88 
Idaho  Resident 

Low  Population  to  Land  Area  State 
of  Residence 

Occupation  (S)(illed,  Uns)cilled,  Technical, 
Factory  Worker) 

State  of  Residence  has  Mandatory  Inspection 

Use  Class  87 

Accident  Prior  to  1966 

Low  Population  Growth  State  of  Residence 


REEP  SLAM 

.2838  .2792 

.1496  .1495 

7348  7348 


Discussion:  The  predictors  selected  appear  to  be  reasonable.  The  importance  of 
each  predictor  can  be  Interpreted  as  follows:  When  a predictor's 
condition  is  satisfied  (its  value  is  equal  to  unity)  the  amount  of 
the  corresponding  coefficient  is  added  to  the  additive  constant 
(the  first  coefficient  shown).  Should  the  condition  not  be  satis- 
fied nothing  is  added.  Thus,  a New  England  resident  in  Use  Class 
87  will  have  .145  added  to  the  probability  that  he  will  be  submit- 
ting one  or  more  claims  within  the  twelve  month  period  in  question. 


*This  5%  level  of  significance  has  been  adjusted  for  the  number  of  predictors 
screened.  For  a discussion  of  this  point  see  Miller  (1962). 

**Smaller  values  are  preferred. 
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(2)  Data  Base: 
Given  ; 


Objective : 


Solution : 


If  an  average  claim  is,  say  $500,  then  a New  England  applicant 
satisfying  Use  Class  87  (unmarried;  male;  21-24  years  old, 
pleasure,  work,  or  business  use  but  not  farmer;  mileage  im- 
material) should  have  his  pure  premium  increased  by  $72.50  to 
offset  the  added  risk  he  represents  to  the  insurance  company. 

An  independent  test  was  made  to  determine  the  reliability 
of  the  selected  predictors  and  of  their  coe f f ic ience  . For  an 
independent  sample  of  8,000  observations,  the  equation  verified 
its  reliability  and  skill  over  straight  KEEP. 

The  Connecticut  State  Highway  Department  accident  data. 

The  predictors  were  the  characteristics  recorded  on  5,000  car- 
car  accidents  in  Connecticut.  These  included  such  items  as: 
location,  hour  of  day,  day  of  year,  age  of  driver  considered 
the  victim,  model  year  of  automobiles,  contributing  factor,  type 
of  vehicle  and  violation. 

Predict  the  probability  that  the  driver  considered  to  be  at 
fault  was  twenty  years  old  or  less. 

The  equation  was  determined  from  SLAM  and  the  associated 
selected  predictors  were: 

Connecticut  State  Highway  Department  Data 


Coef  f ic ients 


Selected  Predictors 


. 261 
- . 218 
.159 
. 142 

- . 142 
. 082 

- . 078 
-.073 

.073 


(Additive  Constant) 

Driver  Under  the  Influence 

Hardtop  and  Darkness  with  Highway  Illuminated 
Convertible 

Convertible  and  Darkness  with  Highway  Illuminated 
Vehicle  Type  Unknown  (Other  than  Sports  Car) 

Model  Year  1962 
Daylight 

^ Daylight  and  8 PM  - 9 PM 


Independent 


Data  Results: 

\ 

Base  KEEP 


SLAM 


Information*  .5421 
P Score*  .3448 
Hits  3189 


. 5241 
. 3441 
3189 


.5214 

.3434  . 

3189 


Discussion:  Interpretations  can  be  made  regarding  the  importance  of  the 

selected  predictors.  For  example,  since  most  problem  drivers  are 
alcoholics  and  since  few  young  drivers  are  alcoholics,  it  seem's 
reasonable  that  .218  would  be  subtracted  from  the  probability  that 
the  driver  at  fault  is  <20  given  that  the  driver  was  under  the 
influence.  The  Independent  sample  of  5,000  cases  confirms  the 
stability  and  ef f ect i vensss  of  the  SLAM  procedure. 

(3)  Data  Base:  United  States  Weather  Bureau  records  for  Hartford,  Connecticut. 


Given:  The  predictors  were  the  observed  weather  elements  at  forecast 

time,  such  as:  temperature,  cloud  types,  ceiling  height,  visi- 
bility, wind,  humidity,  weather  (haze,  fog,  rain,  snow,  etc.), 
time  of  day  and  day  of  year.  The  predictand  chosen  was  the  event 
that  there  would  be  no  weather  one  hour  after  forecast  time. 

Objective:  Predict  the  probability  tha*  there  would  be  no  weather  (no  haze, 

no  fog,  no  rain,  no  snow,  etc.)  one  hour  in  advance. 


*Ssialler  values  are  preferred. 


Solution : 


The  equation  determined  from  SLAM  and  the  associated  selected 
predictors  were: 


Hartford,  Connecticut  Weather  Data 

Coe f f ic ien t Selected  Predictors 

,078  (Additive  Constant) 

.166  NO  WEATHER 

.045  CIG  UNLIMITED 

.058  CIG  1000-5000  feet 

.284  VIS  15  miles 

.266  VIS  10-14  miles 

.191  VIS  7-9  miles 

.161  JANUARY 

.052  AUGUST 

-.083  RLH  90-100% 

-.064  WDR  SE 

.058  WSD  10-18  knots 

.152  (NO  WEATHER)  (CTL  SC) 

-.161  (NO  WEATHER)  (CTL  FS) 

.005  (NO  WEATHER)  (CTU  NS,  ST) 

-.161  (NO  WEATHER)  (JANUARY) 

-.161  (CTL  FS)  (JANUARY) 

-.266  (CTL  FS)  (VIS  10-14'  miles) 

-.116  (CIG  1000-5000  feet)  (WSD  10-18  knots) 

-.266  (VIS  10-14  miles) (LIGHT  RAIN) 

.005  (NO  WEATHER) (WSD  2-8  knots) (CTU  NS,  ST) 

.161  (NO  WEATHER)  (WSD  2-8  knots)  (CTL  FS) 

.161  (NO  WEATHER)  (CTL  FS)  (JANUARY) 

.266  (WSD  2-8  miles)  (CTL  FS)  (VIS  10-14  miles) 

.161  (WSD  2-8  miles))  (CTL  FS)  (JANUARY) 

.266  (CTL  FS)  (VIS  10-14  miles)  (LIGHT  RAIN) 

.297  (CIG  1000-5000  feet)  (DBT  71-80°)  (VIS  3-6  miles) 

-.161  (WSD  2-8  miles)  (NO  WEATHER)  (JANUARY)  (CTL  FS) 

-.266  (WSD  2-8  miles)  (CTL  FS)  (VIS  10-14  miles) 

(LIGHT  RAIN) 

Discussion;  A note  of  caution  needs  to  be  raised  in  this  example.  The 

selected  predictors  shown  and  their  corresponding  coefficients 
resulted  from  a very  liberal  test  of  significance.  It  turns 
out  that  no  Boolean  predictors  showed  significance  at  the  usual 
5%  level  over  and  above  the  predictors  selected  using  straight 
REEP.  The  predictors  shown  are  being  presented  only  for  illus- 
trative purposes.  The  fact  that  none  of  the  Booleans  showed 
significance  at  the  5%  level  for  such  a short  period  weather  fore- 
cast seems  highly  plausible.  A more  reasonable  data  base  for  un- 
coverirg  Boolean  predictors  would  have  been  for  forecasts  of 
longer  range  but  unfortunately  none  was  readily  available. 

5 . Conclusion : 

This  report  has  presented  one  method  for  dealing  with  nonlinearity  or  nonadditiv- 
ity in  predicting  meteoroionica)  events  with  statistical  methods.  This  method,  the 
SLAM,  has  the  following  features  (Miller,  1969): 

a.  The  SLAM  is  nonparametr ic ; no  information  about  the  underlying  distri- 
bution is  required. 

b.  The  SLAM  is  multivariate. 

c.  The  SLAM  is  nonlinear. 


d.  The  SLAM  handles  qualitative  variables  easily. 
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its  position  in  the  lattice  (Miller,  1969). 


e.  The  results  are  interpretable . 

f.  Operational  applications  are  easy  to  perform. 

g.  The  SLAM  handles  missing,  erroneous,  or  incomplete  data  systematically. 

h.  The  SLAM  processes  large  numbers  of  variables  efficiently. 

i.  The  SLAM  processes  large  numbers  of  observations  efficiently. 

Further  tests  of  the  SLAM  in  meteorological  applications  should  be  performed  to 
determine  its  superiority  or  non-superiority  over  other  methods. 


CHAPTER  9 


DELPHI  TECHNIQUE 
by  Lt  Col  James  W.  Taylor 

1.  Introduction. 

For  the  systems  analyst,  such  techniques  as  regression  analysis,  linear  programming,  and 
others  too  numerous  to  mention,  are  the  tools  of  the  trade.  However,  there  is  a class  of  pro- 
blems for  which  the  purely  quantitative  methods  available  to  the  analyst  are  either  not  appropriate, 
or  have  not  yet  been  developed.  For  example,  in  the  area  of  politics  and  strategy  for  military 
planning,  the  analyst  must  assess  the  probable  intentions  of  the  enemy  by  weighing  and  evaluating 
a tremendous  amount  of  nonquantifiable  data.  An  example  is  the  development  of  the  U.  S.  military 
role  in  space  (Frye,  1968,  p.  311).  One  method  that  might  be  used  to  address  problems  is  the 
Delphi  technique. 

2.  What  Is  Delphi? 

Developed  by  Olaf  Helmer  (1963),  the  Delphi  technique  is  a methodology  used  to  arrive  at  a 
consensus  of  opinion.  By  using  the  opinions  of  "experts"  and  by  providing  feedback,  these 
experts  are  permitted  to  evaluate  their  own  opinions  in  light  of  the  opinions  of  others,  and  to 
make  adjustments  in  their  evaluations.  After  several  iterations,  a consensus  of  opinion  is 
generally  achieved,  which,  when  quantifiable,  has  been  found  to  be  very  accurate. 

The  originators  of  the  Delphi  technique  believed  it  to  be  a possible  method  of  forecasting 
future  events.  In  fact,  one  of  the  first  applications  of  Delphi,  in  1948,  was  to  use  the  "expert" 
judgements  or  opinions  of  horse  racing  handicappers  to  obtain  a better  estimate  of  a horse's 
chance  of  winning  (Quade,  1968,  p 3341.  Since  then  it  has  been  used  by  several  different  organi- 
zations in  a wide  variety  of  applications.  For  example.  Corning  Glass  Works  used  the  Delphi 
technique  to  forecast  electronic  sales  in  the  consumer  oriented,  industrial,  and  governmental 
business  sectors  five  and  ten  years  in  the  future  (Johnson,  1976,  pp  52-56).  The  U.  S.  Air 
Force  used  this  technique  in  an  attempt  to  quantify  the  distribution  of  quality  required  by  the  Air 
Force  among  the  nonprior  service  accessions.  It  was  felt  that  recruiting  only  enlisted  personnel 
having  a college  degree  (the  high  extreme)  would  lead  to  inefficiency  and  boredom,  while  recruit- 
ing only  non-high  school  graduates  would  not  permit  the  Air  Force  to  carry  out  its  mission.  The 
Delphi  technique  was  applied  in  an  attempt  to  quantify  a distribution  of  quality  that  would  enable 
the  Air  Force  to  do  its  job  (Taylor,  et  al , 1972,  p 44)  . 

3.  How  A Delphi  Ebcperiment  Is  Conducted. 

With  this  brief  overview  of  what  the  Delphi  technique  is,  it  is  appropriate  to  explain  how  a 
Delphi  experiment  would  be  conducted.  Of  course,  the  problem  to  be  answered  must  have  been 
determined  and  the  decision  to  use  the  Delphi  technique  made  in  advance. 

The  first  step  would  be  to  select  the  panel  of  "experts",  personnel  with  some  knowledge  and/or 
experience  in  the  issues  to  be  addressed.  But  how  many  experts  are  needed?  What  background 
qualifies  a person  as  an  "expert"? 

The  first  question  is  easier  to  answer.  Research  has  shown  that  10  to  15  panel  members  are 
generally  sufficient  to  furnish  reliable  results.  Fewer  than  10  members  may  not  provide  adequate 
information  and  feedback  to  obtain  reliable  results,  while  more  than  15  may  seriously  complicate 
the  handling  of  the  data  (Johnson,  1976,  p 52). 

The  second  question,  "What  constitutes  an  expert?"  is  more  difficult.  One  method  of  side-  t 

stepping  the  answer  is  to  evaluate,  or  rank  order,  the  panel  members  by  their  demonstrated  { 

"expertise"  in  the  field,  and  then  assign  relative  weights  to  their  inputs  after  the  responses  are  ^ J 

in  (Helmer,  1963,  p 5).  A method  that  might  be  used  to  rank  order  the  panel  members--to  be  able  j i 

to  say  that  Mr.  X is  more  of  an  "expert"  than  Mr.  Y--would  be  to  have  potential  panel  members  ' 


} ' 
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name  the  person  he  believes  is  the  most  knowledgeable  in  the  field.  By  determining  those  who 
received  the  most  votes,  a panel  of  highly  qualified  experts  could  be  selected  (Nat  Def  Uni,  1976, 
p 2).  Another  method  might  be  for  those  conducting  the  Delphi  experiment  to  rank  order  the 
potential  panel  members  by  their  past  performance:  How  correct  have  they  been  in  previous 
forecasts. 

However,  Olaf  Helmer's  original  Delphi  applications  did  not  consider  the  relative  degree  of 
expertise  of  the  panel.  What  is  required  is  that  the  viewpoints  of  all  panel  members  have  a 
chance  of  being  heard. 

Another  important  point  concerning  the  panel  and  the  Delphi  technique  is  that  of  nonattribution, 
or  anonymity.  Panel  members  must  be  free  to  state  the  reasons  for  their  beliefs  or  choices  with- 
out fear  of  ridicule  (Quade,  1968,  p 334J. 

A final  question  concerning  panel  membership- -A re  there  enough  resources  within  the  organi- 
zation from  which  a panel  of  experts  can  be  drawn,  or  must  we  go  outside  the  organization;  and 
are  these  experts  willing  to  participate?  The  willingness  to  participate  factor  can  be  enhanced  if, 
in  the  beginning  of  the  experiment,  the  benefits  and  rewards  to  be  gained  by  the  panel  members  be 
explained.  To  accomplish  this,  three  factors  should  be  adequately  covered  at  the  start: 

1.  The  purpose  of  the  study, 

2.  An  explanati(  n of  the  Delphi  technique  and  why  it  was  selected,  and 

3.  The  benefits  to  the  panel  members  of  participation  in  the  experiment. 

As  Jeffrey  L.  Johnson  pointed  out  (Johnson,  1976,  p 53),  the  Delphi  technique  is  based  upon 
feedback.  The  panel  members  have  the  benefit  of  the  opinions  of  other  experts  and  will  be 
increasing  their  own  knowledge.  If  they  are  unfamiliar  with  the  Delphi  technique,  they  will  be 
able  to  learn  it,  and  will  possibly  be  able  to  apply  it  in  doing  their  own  forecasting  in  the  future. 
These  benefits,  by  themselves,  may  be  sufficient  to  encourage  enough  participation  from  within 
the  organization.  If  not,  the  only  recourses  available  are  to  go  outside  the  organization,  and, 
possibly,  have  to  pay  for  the  experts'  knowledge  and  opinions. 

4.  A Delnhi  Example. 

A graduate  meteorological  course  in  applied  statistics  given  by  St.  Louis  University  at  Scott 
Air  Force  Base,  Illinois  in  the  spring  of  1977,  offered  an  excellent  opportunity  to  conduct  a 
Delphi  experiment.  Most  of  the  students  in  the  class  were  Air  Force  weather  officers,  or 
officers  who  had  had  considerable  experience  in  the  weather  career  field.  In  addition,  there 
were  other  personnel  available  at  Scott  AFB  working  in  this  field,  e.g.  , the  Weather  Service  War 
Planner  in  the  headquarters  of  the  Military  Airlift  Command. 

As  part  of  the  class  project,  it  was  decided  that  by  conducting  a Delphi  experiment,  the  class 
members,  by  serving  on  the  panel,  would  gain  some  understanding  of  the  Delphi  procedures  and, 
at  the  same  time,  it  might  be  possible  to  answer  a difficult  meteorological  que8tion--namefy, 
"What  factors  - if  any  - are  causing  the  weather  to  change?" 

The  topic  is  a broad  one;  therefore,  it  was  necessary  in  the  beginning  to  restrict  the  area  of 
discussion.  By  weather  was  meant,  for  example,  last  winter's  extreme  cold  spell  in  the  midwest 
and  the  drought  in  the  Pacific  northwest.  By  change  was  meant  a trend  over  a period  of  five  to 
ten  years  (as  opposed  to  day  to  day  changes  or  changes  occuring  over  thousands  of  years.  ) So 
what  we  were  interested  in  discovering  were: 

1.  Is  the  weather  changing? 

2.  If  it  is  changing,  what  factors  could  be  causing  these  changes?  and 

3.  What  is  the  relative  importance  of  each  of  these  factors  in  affecting  a change  in  the 
weather? 
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With  this  overview  of  the  purpose  of  the  study,  the  Delphi  technique  was  then  explained  to  the 
class.  They  would  be  asked  for  their  opinions  on  what  was  causing  the  weather  to  change.  These 
opinions  would  then  be  collected,  analyzed,  and  returned  to  them  for  a next  iteration.  In  this  way, 
everyone  would  have  the  benefit  of  the  entire  class'  knowledge,  and  would  be  able  to  evaluate  their 
own  responses  in  this  light. 

The  next  several  pages  are  the  questionnaires  which  were  submitted  to  the  class.  The  first 
questionnaire  on  page  9-3  served  to  elicit  fresh  ideas  on  what  factors  may  contribute  to  affecting 
a change  in  the  weather.  Thirty-four  different  factors  were  listed  by  the  class  in  response  to  that 
questionnaire.  These  are  listed  in  the  second  iteration  questionnaire  on  pages  9-4  and  9-5. 

Of  particular  note  is  item  number  34  on  page  9-5.  Well  over  half  of  the  class  indicated  that 
there  had  been  no  significant  change  in  the  weather;  that  the  unusual  cold  or  drought  spells  that 
have  been  experienced  are  well  within  accepted  probability  limits.  However,  at  the  same  time, 
those  who  indicated  that  there  had  been  no  significant  change  in  the  weather  also  listed  other 
factors  (from  the  first  33  on  the  list)  that  were  causing  the  weather  to  change. 

Therefore,  in  order  to  continue  with  the  experiment,  to  more  fully  demonstrate  the  Delphi 
technique,  item  34  was  omitted  from  any  further  consideration.  It  would  have  been  fruitless  to 
ask  what  is  causing  the  weather  to  change  if  all  agree  that  the  weather  is  not  changing.  This 
could  be  the  real  conclusion  of  the  study.  The  variability  of  the  weather  that  has  been  experienced 
in  the  last  few  years  may  well  be  just  normal  variation. 

However,  to  continue  the  Delphi  experiment  further,  it  was  decided  to  delete  this  response. 


WHAT  IS  CAUSING  THE  WEATHER  TO  CHANGE? 

A DELPHI  APPLICATION 

This  questionnaire  has  been  developed  to  elicit  your  ideas  and 
opinions  on  what  factors  have  an  influence  on  changing  the 
weather.  This  questionnaire  is  being  used,  rather  than  a face- 
to-face  confrontation  over  a conference  table,  to  enable  you  to 
express  all  of  your  ideas,  even  those  you  consider  half-baked 
or  far  out.  In  other  words,  it's  a brainstorming  session  with- 
out fear  of  ridicule  and  an  iterative  process  with  feedback. 

We  have  two  primary  objectives  in  this  effort.  The  first  will 
be  to  identify  those  factors  that  may  be  causing  a change  in 
the  weather  (tor  example , exhausts  from  the  internal  combustion 
engine) ; while  the  second  objective  will  be  to  measure  or 
quantify  the  contribution  of  each  factor  in  changing  the  weather. 
(For  exiunple,  industrialization  of  agriculture — permitting  vast 
areas  of  land  to  be  denuded  of  vegetation — may  be  a more 
significant  factor  than  automobile  emissions  in  affecting  the 
weather. ) 

With  this  as  an  overview,  we  request  that  you  list  in  the  space 
below  those  factors  that  you  believe  may  be  influencing  the 
weather.  The  results  will  be  analyzed,  tabulated  and  returned 
to  you  during  the  next  class  meeting. 
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WHAT  IS  CAUSING  THE  WEATHER  TO  CHANGE? 
A DELPHI  APPLICATION 


2.  The  following  tabulation  represents  those  factors  you  have 
indicated  which  may  be  influencing  the  weather.  We  would  like 
your  judgement  of  the  relative  importance  of  each  of  these 
factors.  Specifically — in  the  short  range  of  five  to  ten  years — 
how  important  was  each  factor  in  affecting  a change  in  the 
weather?  For  each  factor,  please  check  only  one  box,  consider 
each  factor  separately. 


UNIMPORTANT 


FACTORS  (Cont) 

11.  Changes  in  ozone  layer. 

12.  Shift  in  long  wave  pattern  that 
caused  persisting  high  pressure 
ridge  in  Pacific/California  region. 

13.  Industrialization. 

14.  Significant  depreciation  of  total 
available  potential  energy  contained 
on  the  earth. 


15.  Rocket  fuel  affluents. 

16.  Nuclear  testing  in  the  atmosphere. 

17.  Underground  nuclear  tests. 

18.  Radioactive  isotopes  in  atmosphere. 

19.  Change  in  position  of  polar  jet 
stream. 

20.  Pollution — causing  increased  number 
of  condensation  nuclei  in  the  atmos- 
phere . 

21.  Changes  in  earth's  albedo  caused  by 
irrigation;  denuded  of  vegetation. 

22.  Secret  Soviet  (Russian/Chinese) 
weapons . 

23.  Increased  airline  flights. 

24.  Satellites. 

25.  Stratospheric  warming/cooling. 

26.  Change  in  infrared  flux  density  in 
lower  atmosphere. 

27.  Increased  coverage  of  land  with  con- 
crete and  asphalt,  depleting  aspira- 
tion sources  of  moisture. 

28.  Vietnam  War. 

29.  Spray  cans  destroying  the  ozone  layer. 

30.  Weather  modification  efforts. 

31.  Volcanic  ash. 

32.  Variation  in  the  earth's  elliptical 
orbit  around  the  sun. 

33.  Variation  in  the  inclination  of  the 
earth's  axis  with  respect  to  the 
elliptical  orbit  about  the  sun. 

34.  No  change  (or  change  within  normal 
variation) . 


UNIMPORTANT 


The  second  questionnaire  was  then  given  to  the  class.  (At  this  point,  itenn  34  was  still  being 
considered.)  The  class  members--the  Delphi  panel--were  then  asked  to  evaluate  the  relative 
importance  of  each  of  the  34  factors  in  its  ability  to  affect  a change  in  the  weather.  They  were 
asked  to  rate  each  factor  either  "Very  Important,  " "Important,  " "Slightly  Important,  " or 
"Unimportant.  " Table  1 summarizes  the  results.  Two-thirds  of  the  class  (14  out  of  21)  believed 
that  there  has  been  no  change  in  the  weather,  rating  item  34  as  either  "Important"  or  "Very 
Important.  " On  the  other  hand,  one  of  the  ideas  submitted  on  the  first  iteration  was  rated 
"UiTimportant"  by  the  entire  panel--number  24,  satellites.  This  demonstrates  one  of  the  good 
features  of  Delphi--it  allows  all  participants  to  express  an  idea,  no  matter  how  farfetched, 
without  fear  of  ridicule. 

DISTRIBUTION  OF  RESPONSES  ON  THE  SECOND  ITERATION  QUESTIONNAIRE 


Item  No 
1 
2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 


Response  1 


3 
2 
0 

4 
1 
1 
0 
0 
0 
0 
2 

4 
0 
1 
0 
0 
0 
0 

5 
4 
0 
0 
0 
0 
3 
0 
0 
0 
0 
0 
1 
3 
3 
9 


2 3 4 


8 8 2 

5 7 7 

0 4 17 

6 2 9 

6 7 7 

5 2 13 

0 2 19 

5 9 7 

2 5 14 

3 12  6 

7 7 5 

5 4 8 

7 11  3 

1 2 17 

1 4 16 

3 6 12 

1 4 16 

2 5 14 

5 3 8 

6 6 5 

5 8 8 

0 5 16 

1 10  10 

0 0 21 

4 4 10 

3 6 12 

3 7 11 

0 1 20 

2 8 11 

2 5 14 

5 5 10 

2 3 13 

2 2 14 

5 3 4 


TABLE  1 
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The  next  problem  vas  to  evaluate  these  subjective  ratings  to  determine  which  were  the  more 
important  factors  in  influencing  a change  in  the  weather.  Several  weighting  schemes  were  applied 
to  the  different  subjective  judgements.  The  following  factors  were  the  most  important:  pollution, 
a shift  in  the  long  wave  pattern,  change  in  the  ozone  layer,  change  in  ocean  currents,  the  heal 
island  effect  of  large  cities,  changes  m the  earth's  albedo,  and  geothermal  heat  source  variations 
over  a large  area  of  the  earth. 

It  was  necessary  to  limit  the  number  of  important  factors  to  be  analyzed  in  the  next  iteration 
for  two  reasons.  First,  the  technique  described  by  Thomas  L.  Saaty  (1973),  using  an  eigenvalue 
analysis,  was  to  be  used  to  evaluate  the  relative  importance  or  contribution  of  these  factors. 
Unless  the  matrices  developed  are  kept  small,  the  task  of  evaluating  or  comparing  each  factor 
with  every  other  factor,  one  at  a time,  becomes  enormous.  Second,  the  computer  available  to 
the  author  would  solve  the  eigenvalue  problem  only  for  small  (nine  by  nine)  matrices.  Therefore, 
only  the  eight  factors  mentioned  above  were  included  in  the  third  iteration. 


WHAT  IS  CAUSING  THE  WEATHER  TO  CHANGE? 
A DELPHI  APPLICATION 


3.  The  attached  tabulation  represents  those  factors  you  have 
indicated  which  may  be  influencing  the  weather.  We  would  like 
your  judgement  of  the  relative  importance  of  each  of  these 
factors.  Specifically — in  the  short  range  of  five  to  ten 
years — how  influential  was  each  factor  in  affecting  a change 
in  the  weather?  As  a preliminary  means  of  establishing  the 
relative  importance  of  each  factor,  the  following  order  is 
defined: 


DEGREE 

OF  IMPORTANCE 


DEFINITION 


EXPLANATION 


EQUAL  IMPORTANCE 


TWO  FACTORS  ARE 
EQUAL  IN  THEIR 
ABILITY  TO  AFFECT 
WEATHER  CHANGES 


WEAK  IMPORTANCE 
OF  ONE  FACTOR 
OVER  ANOTHER 


THERE  IS  SOME  CON- 
VICTION THAT  ONE 
FACTOR  HAS  MORE 
INFLUENCE  THAN 
THE  OTHER 


STRONG  IMPORTANCE 
OF  ONE  FACTOR  OVER 
ANOTHER 


STRONG  BELIEF  THAT 
LOGICAL  CRITERIA 
EXIST  TO  SHOW  THAT 
ONE  FACTOR  IS  MORE 
IMPORTANT  THAN  THE 
OTHER  FACTOR 


DEMONSTRATED  IMPOR- 
TANCE 


ABSOLUTE  CONVICTION 
AS  TO  THE  IMPORTANCE 
OF  ONE  FACTOR  OVER 
ANOTHER 


2,  4,  6 


INTERMEDIATE  VALUES 


WHEN  COMPROMISE  IS 
NEEDED 


RECIPROCALS  OF 
ABOVE  NON- ZERO 
NUMBERS 


IF  FACTOR  i HAS  VALUE 
X WHEN  COMPARED  TO 
FACTOR  j,  THEN  j HAS 
THE  RECIPROCAL  VALUE 
1 WHEN  COMPARED  WITH  i 

X 


With  these  definitions,  please  evaluate  each  of  the  factors  in 
the  left  column  against  the  factors  listed  in  the  top  row. 


COMPARISON  OF  FACTORS  WITH  RESPECT 


TO  INFLUENCE  ON  THE  WEATHER 


POLLUTION  OF  THE  ATMOS- 
PHERE (MANY  CAUSES) 

SHIFT  IN  LONG  WAVE 
PATTERN  (AND/OR  JET 
STREAM) 

CHANGE  IN  OZONE  LAYER 

CHANGE  IN  OCEAN  CURRENTS 

SOLAR  CYCLES 

HEAT  ISLAND  EFFECT  OF 
LARGE  CITIES 

CHANGES  IN  THE  EARTH’S 
ALBEDO  (INCLUDING  AGRI- 
CULTURE) 

GEOTHERMAL  HEAT  SOURCE 
VARIATIONS  OVER  A LARGE 
ABEA 

POLLUTION  OF  THE  ATMOSPHERE 
(MANY  CAUSES) 

1 

1 

SHIFT  IN  LONG  WAVE  PATTERN 
(AND/OR  JET  STREAM) 

1 

I 

CHANGE  IN  OZONE  LAYER 

CHANGE  IN  OCEAN  CURRENTS 

SOLAR  CYCLES 

HEAT  ISLAND  EFFECT  OF 

LARGE  CITIES 

CHANGES  IN  THE  EARTH'S 
ALBEDO  (INCLUDING  AGRICUL- 
TURF.) 



GEOTHERMAL  HEAT  SOURCE 
VARIATIONS  OVER  A LARGE 
-AHEA.^ 

L. 

The  instructions  for  the  third  iteration  questionnaire  are  shown  on  page  9-7.  Each  panel 
member  was  asked  to  compare  each  of  the  factors  against  every  other  factor,  one  at  a time,  and 
then  to  indicate  the  degree  of  influence  one  factor  had  over  the  other  in  affecting  a change  in  the 
weather. 

For  example,  items  in  the  vertical  column  were  compared  against  solar  cycles  across  the 
top.  For  example,  if  it  were  believed  that  there  existed  logical  criteria  to  show  that  pollution 
is  more  important  than  solar  cycles  in  influencing  the  weather,  the  panel  member  would  place 
a "5"  in  the  first  row,  fifth  column  position.  Then  the  reciprocal  value  ”1/5"  would  be  placed 
in  the  opposite  posttion--in  the  fifth  row,  first  column. 

When  two  items  were  rated  equal  in  their  ability  to  influence  the  weather,  or  when  a factor 
was  rated  aga'nst  itself,  a "1”  would  be  placed  in  that  position.  Thus,  the  main  diagonal  of  the 
matrix  would  contain  only  I's, 

The  responses  of  all  panel  members  were  analyzed  to  obtain  the  median.  With  this  informa- 
tion, the  matrix  shown  on  page  9-9  was  obtained.  The  eigenvalue  of  this  matrix  is  slightly  less 
than  9,  and,  thus,  not  too  far  from  the  consistent  value  8,  The  eigenvector  corresponding  to  this 
largest  eigenvalue,  normalized  so  that  the  sum  of  the  individual  terms  is  1,  is  given  by: 

X-  [0.12  0.18  0.13  0.10  0.26  0.06  0.11  0.04) 
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COMPARISON  OF  FACTORS  WITH  RESPECT 


TO  INFLUENCE  ON  THE  WEATHER 


POLLUTION  OF  THE  ATMOS- 
PHERE (MANY  CAUSES) 

SHIFT  IN  LONG  WAVE 
PATTERN  (AND/OR  JET 
STREAM) 

CHANGE  IN  OZONE  LAYER 

CHANGE  IN  OCEAN  CURRENTS 

SOLAR  CYCLES 

HEAT  ISLAND  EFFECT  OF 
LARGE  CITIES 

CHANGES  IN  THE  EARTH'S 
ALBEDO  (INCLUDING  AGRI- 
caiLTURE) 

GEOTHREMAL  HEAT  SOURCE 
VARIATIONS  OVER  A LARGE 
L&m 

POLLUTION  OF  THE  ATMOSPHERE 
(MANY  CAUSES) 

9 

9 

9 

B 

1/5 

2 

3 

3 

SHIFT  IN  LONG  WAVE  PATTERN 
(AND/OR  JET  STREAM) 

■ 

9 

5 

B 

B 

2 

B 

5 

CHANGE  IN  OZONE  LAYER 

9 

1/5 

B 

B 

B 

3 

B 

3 

CHANGE  IN  OCEAN  CURRENTS 

9 

B 

B 

B 

1/3 

2 

B 

3 

SOLAR  CYCLES 

5 

9 

B 

3 

B 

B 

3 

5 

HEAT  ISLAND  EFFECT  OF 

LARGE  CITIES 

1/2 

1/2 

1/3 

1/2 

1/4 

B 

B 

B 

CHANGES  IN  THE  EARTH'S 
ALBEDO  (INCLUDING  AGRICUL- 
Tiiiiet 

1/3 

9 

B 

B 

1/3 

B 

B 

— 

3 

GEOTHERMAL  HEAT  SOURCE 
VARIATIONS  OVER  A LARGE 



1/3 

1/5 

1/3 

1/3 

1/5 

B 

1/3 

B 

The  Delphi  panel  was  then  asked  to  reevaluate  their  previous  inputs  in  light  of  this  new  infor- 
mation. The  results  of  this  fourth  iteration  did  not  change  the  matrix  entries.  Thus,  the 
figures  above  are  really  the  final  result. 

5.  Conclusion. 

The  several  conclusions  resulting  from  this  exercise  fall  into  two  areas. 

First,  with  regard  to  the  experiment  itself,  it  must  be  emphasized  how  important  it  is  that 
the  instructions  in  the  beginning  be  clear  and  understood  by  all  of  the  participants.  In  this 
experiment,  even  to  the  last  iteration,  there  were  one  or  two  panel  members  who  were  unclear 
about  the  purpose  of  the  effort.  Also,  adequate  time  must  be  furnished  to  the  participants  to 
accurately  complete  the  questionnaires.  In  this  exercise,  which  was  generally  accomplished  in 
class  during  a break  period,  not  enough  time  was  afforded  the  panel  members  to  really  evaluate 
their  inputs,  or  to  furnish  sound  reasons  for  their  positions. 
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The  second  area  deals  with  the  results  of  the  study.  It  appears  from  the  eigenvector  on  page 
9-9  that  the  dominant  factor  influencing  the  weather  is  the  solar  cycle.  One  must  remember, 
however,  that  we  have  assumed  that  the  weather  is  changing.  We  threw  out  the  reason  that  the 
changes  we  have  experienced  are  within  normal  probability  limits. 


CHAPTER  10 


RESULTS  OF  A SINGLE  STATION  FORECASTING  EXPERIMENT 

Robert  G.  Miller 
Roger  C.  Whiton 
Michael  J.  Kelly,  Jr 

Headquarters  Air  Weather  Service 
Scott  Air  Force  Base,  Illinois 

ABSTRACT 


This  paper  describes  a forecasting  experiment  performed  using  actual  single 
station  data  for  Rickenbacjcer  AFB,  Ohio.  The  model  used,  suggested  by 
Miller  (1968),  employs  the  principle  of  multivariate  regression  in  a Markov 
process.  The  model  predicts  the  probability  distribution  of  the  following 
observed  weather  elements;  temperature,  dew-point  depression,  station 
pressure,  crosswinds,  wind  direction  and  speed,  sky  cover,  ceiling,  visi- 
bility, and  weather  elements  including  haze  and  smoke,  blowing  conditions, 
fog  .ind  ground  fog,  freezing  precipitation,  rain  and  drizzle,  snow,  rain 
showers,  snow  showers,  and  also  the  condition  of  no  weather.  At  forecast 
time  the  system  uses  all  of  these  elements  as  predictors,  plus  day  of  year, 
time  of  day,  and  the  most  recent  3-hour  pressure  change.  An  extension 
is  being  formulated  using  eigenfunctions  which  allows  predictions  at  any 
future  time,  not  just  at  discrete  points.  Results  of  nearly  30,  000  indepen- 
dent forecasts  are  described,  where  comparisons  using  the  Brier  score 
and  hits  are  made  against  persistence  and  conditional  climatology  (ceiling 
and  visibility  only). 

1.  INTRODUCTION 

The  Air  Weather  Service  (AWS)  requires  a variety  of  approaches  to  weather  forecasting  in 
order  to  provide  military  weather  support  across  a full  spectrum  of  conflict  scenarios.  Peacetime 
weather  support,  strategic  and  tactical  activities,  and  support  to  command  and  control  agencies 
demand  a highly  developed,  computer  based,  centralized  production  system.  In  certain  tactical 
conflicts,  on  the  other  hand,  there  is  a need  for  a stand-alone  forecasting  method,  a technique 
that  can  be  used  skillfully  by  a weatherman,  when  circumstances  deny  his  standard  communications. 

To  meet  the  need  for  such  a capability,  we  conducted  a development  effort,  known  as  the 
Single  Station  Forecasting  Experiment.  This  paper  will  cover  the  purpose  of  the  experiment,  the 
data  used,  the  methodology  employed,  the  results  obtained,  and  possible  extensions  to  the  model. 

2.  PURPOSE 

There  were  at  least  six  reasons  for  undertaking  this  experiment:  (1)  The  AWS  Commander 
requested  it  , (2)  A single  station  capability  was  strongly  advocated  by  the  AWS  wing  commanders  , 
(3)  Military  exercises  suggested  the  desirability  of  a stand  alone  forecast  capability,  (4)  Circum- 
stances of  no  communication  with  the  centralized  production  system  required  it  (6)  A fast,  com- 
pact simulator  of  both  weather  observations  and  forecasts  was  required  for  input  into  the  Defense 
Department's  war  gaming  models  and  (6)  Automated  statistical  forecasting  algorithms  under 
development,  such  as  found  in  the  Modular  Automated  Weather  System  test  at  Scott  .^FB,  could 
use  multiple  elements  as  predictors. 

The  capability  would  have  value  in  two  typical  operational  scenarios: 

• New  Forecaster  - a relatively  new,  inexperienced  forecaster  deployed  to  Europe 
or  Korea--fir8t  time  in  area,  in  need  of  guidance. 

• Army  Field  Support  - a non-communication  situation.  Must  provide  weather  fore- 
casts having  only  the  current  observation. 
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The  goal  was  to  provide  the  AWS  forecaster  with  a stand  alone  capability  for  making  probability 
forecasts, 

3.  DATA 

Let  us  address  the  details  of  the  experiment.  First  the  data  specifications: 

Specifications*: 

Location--Rickenbacker  AFB,  Ohio 
Forecast  intervals--3  hours 
Predictors  (single  station  only): 

Day  of  year 
Time  of  day 

Temperature --dew  -point  depression 

Sea  level  pressure 

Cross  wind 

Wind  rose 

Wind  speed 

Temperature 

Sky  cover 

Ceiling 

Visibility 

Weather  (smoke,  dust  or  haze;  blowing  snow,  dust  or  sand;  fog;  frozen 
precipitation;  rain  or  drizzle;  snow;  rain  showers;  snow  showers; 
thunderstorms;  none) 

Pressure  change  (past  3 hours) 

Elements  predicted--same  as  above  except  DOY,  TOD,  which  are  deterministic 
and  <SP  which  is  irrelevant 

Data  sample-- 10  years  dependent,  10  years  independent  (29,  154  new  forecast 
situations) 

Note  the  absence  of  vital  cloud  information,  cloud  layer  height,  cloud  layer  amount, 
and  cloud  type  for  low,  middle^  or  high  conditions. 

4.  METHODOLOGY 

The  statistical  method  employed  in  this  experiment  is  that  of  multivariate  regression, 
suggested  by  Miller  (1968).  That  is,  many  variables  are  predicted  with  the  same  set  of  predictors. 
One  other  condition  also  exists.  All  of  the  predictors  are  either  one  or  zero  (on  or  off).  The 
regression  predicts  the  probability  that  a condition  is  on  or  off.  The  following  depicts  such  an 
arrangement. 


OBSERVA  TION 


DOY 

TOD 

TD 


Predictor  CIG  <200* 
Elements  . 

ip 


*See  Appendix  for  details 


PREDICTAND  ELEMENTS 


PROBABILITY  PREDICTIONS 

10-2 


The  predictors  are  shown  as  rows,  the  predictands  as  columns.  Note  that  both  involve  thr 
exact  same  elements.  For  this  problem  there  were  153  predictors. 

The  objective  is  to  multiply  the  observation,  made  up  of  ones  and  zeros,  times  the  regression 
coefficients  in  a particular  column,  such  as  the  one  shown  for  ceiling  <200'.  The  result  is  the 
probability  that  in  three  hours,  i ceilirig  *200'  will  be  observed.  That  probability  is  entered  into 
the  probability  prediction  row,  as  arc  all  the  other  predicted  probabilities. 

Mathematically  this  is  expressed  as 

P I 6_'  A 

where  P is  the  row  vector  of  predicted  probabilities,  0 ” is  the  transpose  of  the  observation 
column  vector  0,  and  A is  the  matrix  of  regression  coefficients  where  each  column  constitutes 
the  regression  coefficients  for  a particular  predictand.  This  formulation  generalizes  to 

^T  = 8 ' A 

when  an  estimate  of  the  probability  is  desired,  under  a Markov  assumption,  for  T units  of  time 
into  the  future.  The  basic  unit  of  time  in  this  experiment  was  3 hours.  Therefore,  to  estimate 
the  probabilities  lor  6 hours,  T = 2. 

5.  RESULTS 

The  following  summarizes  the  results  for  independent  3 and  6 hour  forecasts  on  29,  154 
forecast  situations: 

• Regression  is  superior  to  persistence  on  number  of  hits  for  all  variables 
except  snow  at  3 hoilrs. 

• Regression  is  superior  to  conditional  climatology  in  terms  of  hits  and 
Brier  score  for  visibility  and  ceiling. 

Specifically,  for  3 hour  forecasts,  the  comparison  between  multivariate  regression  and  per- 
sistence is  shown  below  in  terms  of  the  number  of  correct  forecasts  (hits): 


Weather 

Elements 

Regression 

NUMBER  OF  HITS 

Persistence 

Diffei 

Wind  speed 

28022 

27761 

+ 261 

Cross  wind 

28932 

28848 

+ 84 

Temperature 

22299 

22152 

+ 147 

Visibility 

24061 

23317 

+ 744 

Ceiling 

21360 

20995 

+ 365 

Sky  cover 

17361 

17313 

00 

Rain 

27990 

27944 

+ 46 

Rain  showers 

28327 

27902 

+ 425 

Snow 

28427 

28496 

- 69 

Snow  showers 

28779 

28616 

+ 163 

Thunderstorms 

28894 

28729 

+ 165 

Freezing  precipitation 

29114 

29096 

+ 18 

Smoke  or  haze 

25292 

25026 

+ 266 

Blowing  snow  or  sand 

29100 

29090 

■t  10 

No  weather 

24063 

23776 

+ 287 

In  making  this  comparison,  the  probabilistic  regression  forecasts  were  categoriz  'd  by 
selecting  the  condition  with  the  highest  forecast  probability.  Remember  that  persistence  is  a 
formidable  competitor  at  3 hours.  The  visibility  and  ceiling  results  are  especially  impressive. 
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The  b hour  comparisons  are  as  follows: 


► 


Weather 

NUMBER  OF  HITS 

Elements 

Regression 

Persistence 

Difference 

Wind  speed 

27984 

27390 

+ 

594 

Cross  wind 

28928 

28766 

+ 

162 

Temperature 

19152 

17253 

+ 

1899 

Visibility 

23439 

21807 

+ 

1632 

Ceiling 

19423 

18300 

1 123 

Sky  cover 

15066 

14162 

+ 

904 

Rain 

27870 

27528 

+ 

342 

Rain  showers 

28323 

27706 

+ 

617 

Snow 

28374 

28238 

+ 

136 

Snow  showers 

28775 

28544 

+ 

231 

Thunderstorms 

28890 

28656 

+ 

234 

Freezing  precipitation 

29110 

29076 

+ 

34 

Smoke  or  haze 

23431 

23104 

+ 

327 

Blowing  snow  or  sand 

29096 

29084 

+ 

12 

No  weather 

22344 

21664 

+ 

680 

For  verifying  the  probabilities, 

regression  is  compa 

red  with  the  stiff  competitor, 

, conditional 

climatology,  using  the  well  known  Brier  score.  Smaller  values  are  better.  The  differences  are 
highly  significant  statistically.  That  is,  , the  likelihood  of  achieving  differences  of  these  magnitudes 
by  pure  chance  is  practically  zero. 

COMPARATIVE  STATISTIC 
Brier  Probability  Score* 


Weather 

Element 

Regression 

Conditional 

Climatology 

Difference 

Visibility 

. 2564 

. 2732 

.0169** 

3 hr  4 

1 

^Ceiling 

. 3755 

. 4043 

. 0288** 

{ 

r 

Visibility 

. 2998 

. 3175 

. 0177** 

6 hr 

1 

^Ceiling 

. 4397 

. 4763 

. 0366** 

6.  DISCUSSION 

The  statistical  method  employed  does  not  require  a large  computer  to  make  the  forecasts.  It 
did  require  some  heavy  computing  to  achieve  the  results,  for  which  we  acknowledge  the  assistance 
of  these  installations:  Saint  Louis  University,  the  USAF  Environmental  Technical  Applications 
Center,  the  Military  Airlift  Command  Data  Automation,  and  the  Defense  Commercial  Communica- 
tions Office, 

The  more  important  features  of  the  procedure  are: 

Skillful  - Consistently  superior  to  persistence  and  conditional  climatology. 

Objective  - No  judgment  is  needed.  Two  different  people  should  get  the 
identical  answer. 

Distribution  Free  - No  need  to  make  any  assumption  such  as  normality. 


* Smaller  values  are  better 
**  Highly  statistically  'Ignificant 
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Multivariate  - Same  predictors  used  for  many  predictands. 


Fast  - Operates  in  real  time. 

Easy  to  Use  - Can  be  run  on  a small  calculator  or  by  hand. 

Asynoptic  Data  - Should  a special  observation  be  warranted,  a new  prediction  can 
be  made  with  these  data.  PIREPs,  radar,  satellite,  or  intermediate  times  are  no 
real  problem. 

Nonlinear  - The  zero-one  predictors  are  weighted  separately  over  the  range  of 
the  original  weather  variable. 

Interpretable  - To  see  the  effect  of  any  observed  condition  on  the  predicted 
probability,  just  look  at  its  coefficient. 

Variable  Threshold  - It  is  a simple  matter  to  set  new  limits  on  the  predictand 
desired.  Merely  add  the  coefficients  or  the  probabilities  for  the  categories 
needed.  Since  all  weather  elements  have  only  a finite  number  of  digits  of 
accuracy,  it  will  require  only  the  inclusion  of  each  possible  value  as  a predictand. 

Growth  Potential  - Improvements  in  statistical  methods,  computers,  observa- 
tional accuracy,  or  new  measurements,  like  satellite,  radar,  solar,  can  be  added 
to  make  the  forecasts  better.  Incidentally,  other  things  might  also  enhance  the 
forecast  accuracy  such  as:  one  hour  equations,  stepwise  selection  of  predictors, 
time-change  predictors,  and  interactions  among  predictors. 

7.  EXTENSIONS  OF  THE  MODEL 

To  make  predictions  at  6 hours,  the  3 hour  regression  coefficient  matrix  was  squared  utilizing 
a Markov  assumption.  Another  approach  to  making  a 6 hour  forecast  from  the  3 hour  regression 
coefficients  would  have  been  to  enter  the  3 hour  forecasted  probabilities  into  the  observation 
vector  and  have  it  reprocessed.  A more  general  alternative  would  be  to  use  eigenfunctions.  This 
would,  in  principal,  permit  predictions  into  the  future  with  time  as  a continuum. 

The  formulation  of  this  latter  alternative  has  been  made  and  tested  on  a more  limited  set  of 
data  than  was  used  in  the  single  station  experiment.  It  worked  as  expected.  Furthermore,  the 
need  for  computer  storage  was  greatly  reduced,  since  eigenvectors  were  used  instead  of  a 
coefficient  matrix. 

When  the  predictors  include  time  of  day  and  day  of  year,  the  eigensolution  produces  complex 
roots  and  vectors.  A computational  difficulty  arises  when  solving  large  matrices  of  this  type. 
Research  on  this  problem  is  in  progress. 

Another  extension  in  the  model  introduces  the  concept  of  generalized  operators,  where  one 
matrix  of  regression  coefficients  applies  at  more  than  one  geographical  location.  Similar  appli- 
cations have  been  successful  in  other  contexts  (see  Harris,  Bryan  and  MacMonegle,  1963  and 
1966).  Research  in  this  area  is  continuing. 


i 
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Appendix  A:  Details  of  the  Single  Station  Forecast  Experiment 


This  Appendix  describes  the  data  used  and  the  predictor/predictand  categories  for  the  experi- 
ment 

A magnetic  tape  of  hourly  surface  weather  observations  for  Rickenbacker  AFB,  Ohio,  was  obtained 
from  the  United  States  Air  Force  Environmental  Technical  Applications  Center.  These  data  were  orig- 
inally hand  punched  and  are  believed  to  be  of  good  quality.  The  expected  quality  of  the  data,  and 
time  constraints  dictated  that  we  not  edit  the  data.  The  dependent  (1946-1955)  and  independent 
(1956-1965)  samples  consisted  of  24,384  and  29,154  observations  respectively.  If  an  observation 
contained  a missing  element,  it  was  not  used. 

Table  A- 1 describes  the  elements  and  categories  that  were  verified.  They  were  selected  on  the 
basis  of  operational  significance.  For  example,  read  the  three  wind  speed  categories  as  1)  0 to  - 
14,  2)  greater  than  14  but  - 24  and  3)  greater  than  24. 

Table  A-2  describes  the  predictor  elements  and  categories.  The  predictors  were  chosen  subjec- 
tively. Elements  were  chosen  because  they  were  available  on  the  data  tape.  Cloud  type,  for  exam- 
ple, was  not  used  as  a predictor,  since  it  was  not  available  for  part  of  the  20-year  period  of 
record.  Predictor  categories  were  defined  to  align  with  the  verification  categories,  and  to  assure 
an  adequate  number  of  occurrences  for  each  category. 


Table  A-1 . 

Verification 

Elements  and  Categories 

No . of 

El ement 

Categories 

Categories 

Kind  Speed  (Kts) 

3 

0 S 14  i 24  < <0 

Cross  Wind  (Kts) 

2 

0 S 14  < « 

Temperature  (“F) 

6 

Absolute  9 - 15  ^ 31  - 49  ^ 67  5 84  < » 

Visibility  (mi) 

6 

0<'-5<1<2<3<6<»> 

Ceiling  Height  (ft) 

6 

0 < 200  < 500  < 1000  < 3000  < 10000  < » 

Sky  Cover 

4 

Clear  or  partial  obscuration;  scattered 
broken;  overcast  or  total  obscuration 

Ra  in 

2 

Yes  or  No 

Rain  Showers 

2 

Yes  or  No 

Snow 

2 

Yes  or  No 

Snow  Showers 

2 

Yes  or  No 

Thunderstorm 

2 

Yes  or  No 

Freezing 

2 

Yes  or  No 

Fog,  Haze,  or  Smoke 

2 

Yes  or  No 

Blowing  Dust,  Sand,  or  Snow 

2 

Yes  or  No 

No  Weather 

2 

Yes  or  No 
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Table  A-2.  Predictor  Elements  and  Categories 


No.  of 

Element  Categories 

Day  of  Year  26 

Hour  of  Day  (Z)  8 

Temperature-Dewpoint  11 

Depression  (°) 

Sea-Level  Pressure  (m|,)  5 

Crosswind  (Kts)  2 

Wind  Rose  (*  and  Kts)  17 


Wind  Speed  (Kts)  6 

Temperature  (°F)  13 

Sky  Cover  4 

Ceiling  Height  (ft)  21 


Visibility  (mi)  12 

Smoke,  Hare,  Dust  2 

Blowing  Dust,  Sand,  or  Snow  2 

Fog  2 

Rain  or  Drirzle  2 

Freezing  Rain  or  Drizzle  2 

Rain  2 

Snow  2 

Rain  Showers  2 

Snow  Showers  2 

Thunderstorm  2 

No  Weather  2 

Pressure  Change  (3  hour,  mb)  7 


Categories 

1 S 14  5 28  < <366 

00;  03;  06;  09;  12;  15;  18;  21 
0<1<2<3<4<5<6<8<  10  < 15  < 20  <• 


0 - 1000  i 1010  ^ 1020  5 1035  < ■» 

0 ^ 14  ^ «> 

Calm  and  variable; 

1-  45,  0 < 10  < » 

46-  90,  0 < 10  < <” 

91-135,  0 < 10  < » 

136-180,  0 < 10  < « 

181-225,  0 < 15  < » 

226-270,  0 < 15  < « 

271-315,  0 < 15  < » 

316-360,  0 < 15  < » 

0<2<6<10<15<20<» 

Absolute  f)  5 5 5 10  i 15  5 24  5 31  5 40  5 49  5 58 
^ 67  ^ 76  5 84  5 89  < » 

Clear  or  partial  obscuration;  scattered; 
broken;  overcast  or  total  obscuration 

0 < 200  < 400  < 500  < 600  < 700  < 800  < 900 

< 1000  < 1100  < 1600  < 2100  < 2600  < 3000 

< 3600  < 4100  < 6000  < 10000  < 11000  < 13000 

< unlimited;  unlimited 

0<>5<3/4<l<lli<2<24<3<4<5<6 

< 7 < *» 

Yes  or  No 
Yes  or  No 
Yes  or  No 
Yes  or  No 
Yes  or  No 
Yes  or  No 
Yes  or  No 
Yes  or  No 
Yes  or  No 
Yes  or  No 
Yes  or  No 

. 00  < -3.9  < .2.0  < -.9  < 1.0  < 2.0  < 4.0  < "» 


t 

t 
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2 C 

3 C P'JrPCs6-' 

* c 

5 C GIVEN  A HATrIX  OF  COEFFICIENTS  •*'  FrOM  WhICW  THE  rEOUNCANT 
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2^  C 
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29  C INCLUDE  A VECTOR  EQUAL  TO  FIRST  ROW  OF  CROUT  SUM-OF-SOUARES* 
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44  c Supplied  on  file  ’inLE2'  in  transposed  form,  it  is  re- 

45  C transposed  hhile  being  read  in. 

46  C 

47  c program  Output  is  the  expanded  matrIx  <dt  in  Original  form, 
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53 
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Data  input  i 
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data  input  2 
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IFILE2 

11 
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alt  output 
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!0UT 

20 

56 
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97 
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structure  of  input  data  file  iread 

• * 

98 
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61 

V 
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1 1-72 

IPROBTJ)  IIAA 

PROBLEM 
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Number,  date 
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62 
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2 

4 
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11 

REQUEST  pRt  MTRX  A ON  IPRINT 

6J 
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11 

REQUEST  WRITE  MTRX  8 ON  IOUT 

64 
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11 
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65 
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NCATEG 

I4 

number  dummy  variable  cate- 

66 
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gories 

67 
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NZ(NC> 

I4 

number  dummy  variables  in  each 

68 
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NZ(NC) 

14 

variable  category 

69 
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’-12 
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14 

70 
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Etc 

Etc 

71 

c 
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local  Number  indicatiNq  index 

72 
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73 
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NOUT(NC) 

14 
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74 
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data  File  ifilei 
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77 
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SUPPLIED) 

79 

c 

60 

c 

structure 

OF  INPUT 

data  file  IFILE2 
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ING LtFT-OIlT  ROMS  OF  MATRIX  'B.'  EARLIER  MQDEL-VERS I ON 

placec  rows  last  in  Block  of  rows  corresponding  to  a variable 

CATtGORT.  NEW  VERSION  INSERTS  THE  LEFT-OUT  ROW  AT  ITS 

appropriate  position  in  accordance  with  number  of  left-out 
variable  'NOUT.'  revised  version  also  displays  dummy  variable 
count  NZ(NC)> 

PROQRAMMER-- 

CAPl  ROGER  C.  WMIiON 

dimension  IPflCBdB),  nZ(3),  NOUTO).  ♦(7.7)*  fl(in,io>i 

» SSCP<7) 

file  codes.  IrEAD  IS  S^SlN.  IPrINT  Is  PrINTEr-DIrECTEO 
STSOUI.  'IFILEI'  contains  first  row  OF  CROUT  SSCP  matrix  czz* 
IN  other  programs,  'SSCP'  in  this  ONE).  'IFiLEii'  CONTAINS  THE 
Matrix  'Iout*  is  optIon*l  pbnc«  or  t*pe  output. 

IREAD  « '3 

IPRINT  « 6 
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• 'EARTH  AND  ATMOSPHERIC  SC  I ENCES  • ///IX,,  '--PlOOITE  MATRIX  TREAT*, 


1^0 

1^1 

1J2 

Ui 

lib 

137 

138 

139 

140 

141 

142 

143 

144 

145 

146 

147 
140 
149 

1^0 

151 

152 

153 

154 

155 

156 


• 'MENT--'//) 

READ  ( lRkAC.5000)  (IHR0B(JX>.  < 1,18) 

5000  FOhHAT  <10A4) 

WRITE  (1PR|M,6010)  (IPROB(JX),  JX  = 1,,18) 

6ulO  FORMAT  (IHO,  'PROBLEM--  t5o,  18aA) 

C 

C RE,jD  fOR  DETERMInInB  output  mEDI*.  '0'  MATRIX  IS  aLRaTS 

C printed,  bLI  ;F  'IFuGb'  IS  ON,  additional  output  To  FIlE 

C 'lOUT'  IS  made,  if  'IELGA'  is  on»  matrix  'a*  is  also  output. 

C IF  'IFLGC  is  on.  CROuT  sscp  vector  is  displayed. 

c 

REaD  (IrEaC,5010)  JFlG*,  IFLGb,  IFlGc 

c 

c matrix  of  coefficients  a(i,j>  in  left-out  Variable  form  is  coNr 

C SIDERED  PARTITIONED  INTO  A ZO-RQW  <1  IN  FIRST  POSITION  FOLLOWED 

C BY  ZEROES).  AN  AO-COUUMN  (LEFTMOST  COLUMN),  AND  THEN 

c nclaSs  . nclass  Partitions  oe  coefficients,  incateg.  is  tRe 

c number  Of  DUMMY  Variable  categories  in  twe  regression  scheme. 

C FOR  example,  a scheme  In  ceiling,  visibility  and  rind  ^5 

c classes  of  ceiling.  4 Classes  of  visibility,  v classes  of  hind) 

C would  have  NCATEG  = 3.  'NC  IS  INDEX  OF  CATEGORIES.  NC  = 1. 

C NCATEG.  FOR  EACH  SUCH  CATEGORY*  NZ(NC)  GIVES  NUMBER  OF  DUMMY 

c vaRIarLE  Classes  (Including  left-quts)  into  whIcm  the  nc-tm 
c caIegc’’'^  I®  subdivided.  In  Ire  e»ample  abo'^e,  n-Hd  » 5. 

C NZ(E)  * 6|  NC(3>  • 9.  READ  'NCATBG'  AND  'NZ.' 

C 

READ  (IREAC.5010)  NCAtEG 


157 

150 

159 

loo 

161 

162 


IP3 

194 

165 

106 

167 

j^60 

169 

170 

171 

172 

173 


174 

175 
1/6 

177 

178 

1^’ 

180 

181 

182 

183 

184 
IB5 
186 
187 


188 

169 

190 

191 

192 


194 

195 

196 

197 
,98 


SOlO  FOhMAT  (101^) 

READ  (IREAC.5010)  (NZ(NC>.  NO  ^ I.NCATEG) 

C 

c READ  Which  Variables  noutino  are  left  out  fbom  a(i,j),  <nout' 

C IS  the  NOLT-TH  variable  in  the  NC'TH  category,  read  'N0UT,> 

C checking  to  see  it  does  not  exceed  'NZ.' 

c 

REaD  (IrEaC.5010)  (NOUT(NC).  NC  * I.NCaTEG) 

DO  100  NC  > I.NCa'^EG 

IF  <N0LT‘NC>  .LE,  nZ‘NC>>  go  to  100 

WRITE  ( IPRINT,6020)  NC,  NOUT(NC).  NZ(NC) 

60?&  FORMAT  (lHO*  'ERROR...  '.  14.  'TH  lEFT-OUT  VARIABLE  NUMBER  WAS 

• I4/1X.  'Exceeds  number  qe  dummy  variables  (•.  14.  *>  in  <, 

• 'CA*EC0Ryi/iX,  •PMOg’Iam  stops., 

STOP 

100  continue 

c 

C there  must  be  at  least  one  left-out  variable  in  each  CATEGOrT. 

C 

DO  200  NC  > I.NCATEG 
IF  (NOUT(NC)  .CT,  0)  GO  to  200 
WRlTfc  (IPRlM.tOJO)  NO,  N0UT(NC» 

*030  format  (jHO.  'ERROR...  '.  14.  'TH  LEFT-OUT  VARIABLE  NUMBER  HAS  '. 
, l4/iX,  'SRCULD  bE  GrEaTER  THaN  ZERO'/IX,  'PROGRaM  STOPS') 

STgP 

200  cOntiNue 

C 

C calculate  sI^E  of  left-out  MATrI*  'A'  and  rEsTOrED  MATbI*  'B' 

C AND  DISPLAY  RESULTS.  'NVRBlZ'  IS  NUMBER  OF  DUMMIES,  INCLUDING 

C zo.  'NVRBCT'  is  NUMRER  of  left. out  variables,  'N0I“A'  is 

C dimensionality  of  matrix  'A,'  AND  'NDIMB'  IS  DIMENSIONALITY  OF 

C Matrix  'b.' 

C 

nvrblZ  » 1 

DO  300  NC  • I.NCATEG 
NVRBlZ  « NVRBLZ  * NZ(NC) 

300  continue 

NVRBOT  * NCaTEC 
NojMb  « NVRBLZ 
NOIMA  » NCIMB  - NVRBOT 

WRITE  ( IPRIKT ,6040)  nCATEG,  NVRBLZ*!.  NVRBOT,  NOIMA.  NOIMA. 

• NDImB)  NDIMB 


A-3 


1V9 

200 

201 

202 

203 

204 
203 
206 

207 

208 


6040  rOHM*T  (IH  , 'NUMBER  DUMMY  VARIABLE  CATEGORIES^-  T48.  14/lX. 

• 'NUMBtR  dummy  variables  EXCLUDING  20--  T48,  M/lx, 

• 'NUMBER  Of  lEET-OUT  DUMMY  VARIABLES--  'i  T48,  I4/IX, 

• 'ulMENSlONALlTY  OF  LEFT-OuT  MaTRIX  A--  T48,  l4|  ' • »,  I4/IX, 

• 'DIMENSIONALITY  OF  PLODITE  OUTPUT  MATRIX  8--  *.  T48,  14,  ' • ', 

• 14///) 

C 

C DIsPLAT  number  of  UUMMT  variables  in  each  category. 

c 

HrITE  < IPrJNT,*043) 


209 

210 


211 

212 

213 

214 

215 

Si* 

217 


218 

219 

220 


221 

222 

223 

224 
245 


226 


227 

228 

229 

230 

231 

232 

233 

234 

235 

236 

237 

238 

23’ 

240 

241 

242 

243 


244 

245 

246 

247 

248 

249 
2*0 
251 
232 
253 

V* 

255 

236 

257 

258 

259 

260 
2el 
262 

263 

264 

265 

266 
267 


6043  format  (IhO,  'DUMMY  VARIABLES  NZINB)  IN  VARIABLE  CATEGORY  NC--'//’ 

• IX.  10(4X,  'NC’i  2X,  ' NZ  ')//) 

WRITE  I IPRlNT,6046)  (Nc#  NZINc),  Nc  * I.^cATEG) 

C 

c DlsPL*'^  left-out  Dummy  variable  numbers. 
c 

MrITE  ( IPrINT,6045) 

6045  format  (IHO.  'left-out  DUMMY  VARIABLES  NOUTINC)  IN  VARIABLE 

• 'CATEGORY  NC--  '//lx.  10(4x.  'NC'i  2%,  'NOUT'I//) 

WRITE  { IPRINT'6046)  (NC  NOUTINC).  NC  » l.NCATIG) 

6046  format  (4X.  I3,  2X,  l3,  4X,  l3.  2Xt  >3,  4X.  I3,  3X,  l3,  4X.  I3< 

• 2X,  13.  4X,  13.  2X,  13,  4X,  13.  2)1.  13,  4X,  13,  2X,  13.  4X, 

• 13.  2X.  13.  4X.  13,  2X.  13.  4X.  13.  2X.  I3> 

C 

C READ  FIRST  ROH  OF  CROlIT  SSCP  MAtRI*.  FIHsT  ELEMENT  Is  NUMBER 
C OF  CASES  IN  SAMPLE.  NEXT  Is  SUM  OF  -Zl'.  NEXT  SUM  OF  '22',  ETC. 

C DIVIDE  bY  NUMbER  OF  CASES  To  GET  MEANS.  NOTE  THAT  THIS 

C CROUT  IS  IN  left-out  VARIABLE  FORM. 

C 

READ  ( IFILEI.5015)  (ssCP<jX>,  jX  . l.NQIMA) 

5015  format  (6F12.0) 

DO  400  JX  4 2.NDIMA 
SSCPIJX)  * SSCP(JX)  / SSCPIl) 

400  continue 

c 

c DIsPLaT  CRCLr  VECTOR  IF  'IFLGC  IS  ON- 
C 

IF  <IFlGC  .LE.  0)  GO  TO  450 
WRITE  ( IPRINT,6050) 

6050  FORMAT  (IhO.  'FIRST  ROW  OF  SUM  OF  SQUARES  ANP  CROSS  PRODUCTS  •* 

• 'MATRIX  IN'  MEANS)--  '//) 

WRITE  ( IPRlNT,6060) 

6060  FORma^  (IH  . 4X,  'NV.,  3X,  'Siqha  »'.  8X,  ' nV • , 3*, 

• 'SIGMA  V'.  7X,  'NV'.  3X,  'SIOMA  V*.  8X.  'NV*.  3X,  'SIOMA  V'. 

• 8X.  'NV'  . 3x.  'SIGMA  V'^') 

WRITE  nPRlNT.6070)  <JX'  SSCPIJX).  JX  » I.NOIMA) 

6070  format  IlH  , 2X.  U,  IX.  Ei2,5.  4X,  l3,  IXt  E12.5,  3X.  I3,  IX. 

• E12.5,  4X,  13,  iX,  E12.5,  4X,  13,  iX,  E12.5) 

C 

C READ  matrix  A<I,J>,  IT  HAS  BEEN  PREPARED  IN  THE  FORM  OF  TRANs- 
C POSE.  SO  READ  IT  IN  TRANSPOSED  FORM  SUCH  THAT  WHEN  AIl.J) 

C RESIDES  IN  CORE  IT  WIlL  BE  IN  ORIGINAL  FORM.  ROWS  'I'  WILL 

C correspond  to  the  REGRESSION  EQUATIONS.  AND  COLUMNS  «J'  HILL 

c correspond  to  VARlAaLES  In  the  equations.  rote  that  the  'A' 

c maTri*  supplied  on  '1FILE2'  IS  MISSING  The  ‘Zero-roh.*  Thus 

C the  reading  IS  INTO  AREAS  OF  THE  MATRIX  BEYOND  THE  ZERO-ROH, 

C The  zero  ROH  |S  LATER  SUPPLIED. 

C 

45b  READ  ( 1FILE2.5020)  <(a(I.J),  I > 2#NDtHA).  J ■ I.NCIMA) 

5020  FQHHAT  (6E13.5i  2X) 

C 

C SUPPLY  Z^RO-ROM. 

C 

DO  525  J a I.NCIMA 
AIl.J)  * 0,0 
525  continue 

All.l)  * 1.0 
C 

C ELEMENI  *11. 1)  SHOULD  pE  UNITY  *ND  T*xes  ROLE  OF  *10.0),  REST 


2»S 

C 

OF  HOW  1 ShOU|_0  Bfc  ZeHOeS,  cO^Y  7hi5  into  *8,' 

269 

C 

270 

ed.D  * *(1.1) 

271 

HplT2  ( IPrINT.6060)  3(1.1) 

272 

6080 

FOHMIT  (IHO.  'FIRST  OR  "ZER0"-R0li--  '/75X.  'B(l.l)  IS  '. 

273 

. E12.5/5X.  'aLU  other  Bd.J)  SET  TO  ZERO') 

27< 

DO  600  J * 2.NDIHB 

275 

B(l.J)  * 0,0 

276 

600 

continue 

277 

C 

276 

c 

OI'iPLA''  HATrI*  'A'  IF  'IFLGA*  Is  on. 

279 

c 

2b0 

IF  ( IFOG*  .LG.  0)  GO  TO  650 

261 

WRITE  (IPR1M,6120) 

2b2 

61?0 

format  (iHl,  'ORIGINAL  matrix  "A'*  IN  LEFT-OUT  VARIABLE  '. 

2'*3 

• 'FORM*-  '///) 

264 

hritE  ( IprUt.6130) 

265 

6130 

format  (1H  , T*.  •!<,  T9,  'J*.  Ti5,  'A«J)'t  6(7X,  'J'.  5X, 

2e6 

. 'A(j)  ' ) ) 

267 

DO  625  I * l.NCIMA 

266 

WRITE  {IPRINT.6100)  1 

269 

WRITE  ( IPRINT,6io5)  (j,  *(l,j),  j * I.BDIMA) 

2»0 

625 

continue 

291 

c 

292 

C 

j-lOop  gOes  down  Through  ’Ncatec  blocks  of  bows  jN  both  'a* 

293 

C 

AND  '6.' 

294 

C 

295 

650 

DO  2500  ICATEG  « I.NCAtEG 

296 

C 

297 

c 

•IPOINC  POINTS  TO  first  rQR  OF  'B*  IN  PARTITION  BEING  TrEATED. 

296 

c 

•IPTSS'  POINTS  TO  index  OF  'SSCP'  AND  'A*  ARRAYS  CORRESPONDING 

2y9 

c 

PROPER  element  of  'B. ' 

3U0 

c 

301 

660 

IpcInt  = ? 

3U2 

JPISS  s 2 

303 

IF  (ICATEG  .EQ.  1)  GO  TO  705 

304 

DO  Too  IX  > i.ICATEG-i 

3g5 

IPqInT  s IPcTnT  ♦ NZ(IX) 

306 

670 

IPISS  s IPQINT  - Ix 

307 

700 

CONTINUE 

306 

c 

•IYOUT'  points  TO  UEFT-OUT  RQ)*  OF  R-MATRlX  APPLICABLE  IN  PRESENT 

309 

c 

310 

c 

Block  of  i-rows,  it  must  be  computed  once  fob  each  i-row 

311 

c 

BLOCK  corresponding  TC  VARIABLE  CATEGORY. 

312 

c 

313 

705 

IVOUT  » 1 

314 

IF  (ICATEG  .EO,  1)  GO  TO  715 

3*5 

DO  Tjo  IX  * 1.  ICATEG-i 

316 

IVUUT  » IVOLT  ♦ NZ(IX) 

317 

710 

contInle 

316 

715 

IVOUT  * IVOUT  ♦ NOuTdCATFG) 

319 

c 

320 

c 

advance  THBOUCH  an  I-BLOCK,  '!'  Is  ABSOLUTE  INDEX  IN  '8' 

321 

c 

ARRAY. 

322 

c 

323 

775 

DO  200U  1 » IPOINT,  IP0INT*n2( ICATEG)“1 

324 

c 

325 

c 

SKIP  THE  LEFT-OUT  ROW  'IVOuT,' 

326 

c 

327 

IF  (1  .EC.  IVOUT)  GO  TO  2000 

326 

c 

329 

c 

FIND  THE  APPROPRIATE  R0“  OF  'A'  FBQM  WHICH  T8  TAKE  THE  COEFFI. 

330 

c 

CIENTS  used  in  preparing  UEFT-OUT  'B'.'  this  need  be  done 

331 

c 

ONLT  once  for  each  • I . ' 

332 

c 

333 

740 

lA  » 1 

334 

IF  ‘ICATEG  .EQ,  1’  GO  TO  775 

335 

DO  770  IX  « 1.ICATE6-1 

336 

lA  • lA  • (NZdX)  - 1) 

A-5 


^'^0  CO^TI^uE 

338  77b  Do  7bn  U s IPOiNT,! 

339  IF  (IX  .El^.  IVOUT)  GO  ^0  780 

3<0  14  * M ♦ 1 

341  780  continue 

342  C 

343  C J-lOQP  goes  across  THROUGM  'NCATEG'  BLOCKj  of  columns.  first 

344  c COLU“N  IS  ALWAYS  OMITTED  BECAUSE  IT  IS  TREATED  SEPARATELY 

3^5  C 9bLCW. 

346  C 

347  C ^FRO  ThE  B-ACCUMULATOR  I'BACCDM')  USED  TO  SET  THE  '00'  COLUMN, 

348  C 

349  BACCU*^  = 0.0 

330  C 

3P1  DO  1500  jCATEG  » IiNCAtEG 

352  C 

333  C 'JROINI'  POINTS  TO  FIR^T  CULuMN  OF  '0'  IN  PAHtItION  BEING 

334  c TREATED. 

335  C 

336  790  ..POINT  = 2 

337  IF  (jCATEG  .EQ,  1)  GO  TO  325 

3 = 8 DO  800  jx  : 1,..CATEG-1 

33’  JPOINT  : JPCINT  ♦ NZ(JX) 

300  bOO  CoivTINl^ 

301  C 

36?  c 'JcNC9’  Is  end  of  j-sCaN  IN  B-ARRAY, 

303  C 

304  b?5  jENDb  = .POINT  ♦ N^(jCaTEG)  - 1 

3b5  D 

3o6  C FIhsI  SUPPlT  TEE  'b'  CORRESPONDING  TO  THE  LEFT-CUT  '4.'  THE 

367  C N0UT(,CATEG)-TH  A. VALUE  IS  LEFT  CuT.  COMPUTE  ABSOLUTE  INDEX 

398  C 'wOuT'  CF  LEFT-OUT  'R.' 

30’  C 

370  JOuT  : JPClNT  ♦ NOUTCJCatEG)  . 1 

3/1  c 

372  C COmPvTc  index  lIMITs  IN  A-ARRAT  FCR  A-VALUEs  TO  BE  UsED  IN  COM- 

3/3  c puting  the  lEFt-oot  'b.'  the  trick  is  that  All  of  thc  a- 

3/4  C values  will  always  be  used  in  this  step,  one  NEFC  only  multiply 

3/5  C ThEM  BY  appropriate  SbCP  ELEMENTS,  ADD,  AND  CHANGE  SIGN, 

3/6  C 

377  640  JBuNA  s 2 

3/8  JENDA  ® vPGNA  ♦ (NZ(1)  - ?) 

3/9  IF  (jCATEG  .EO.  1)  CO  TO  900 

390  DO  85^  j*  X i>jCATtG-1 

391  JINC  = (NZ(JA)  - 1) 

3B2  JB^N*  = hB*'N*  * JINC 

3o3  jENDA  : .BGNa  ♦ N7(jAtl)  - ? 

3e4  650  continue 

305  C 

306  C selecting  A's  from  ROW  '14'  AND  BETWEEN  'JBGNAi  AND  'JENDA'  AND 

307  c Z'S  from  correspondingly  j-incexed  sscp-vcctor,  perform  muLTI- 

368  c Plication  to  prepare  the  left-out  'b,* 

3“’  c 

390  900  6(  I.uOlT)  * 0.0 

391  DO  950  JA  X jBGNA.uENOa 

392  B«1.wOuT)  » Btl.jOUT)  ♦ ( aIiA.JA)  • SSCP(JA>  1 

393  950  CONTINUE 

394  B(l.jOUT)  X -0(1, JOUT) 

J95  c 

396  C saYE  this  LEFT-OUT  'd'  AS  correction  TErM  to  BE  APPLIED  TO  THE 

397  C OTHEB  A'S  IN  ITH  HOW  yF  B-ARRAY, 

398  C 

399  960  0 = 011  .uOLT  ) 

4U0  c i 

401  C ACLUmUlATE  all  left-out  B's  for  THE  ROW  'I'  BEING  TREATED.  THE  * , 

4u2  c accumulated  total  Will  later  be  used  in  phfparinq  Elements  f 

403  C B(l.l).  i 

404  c 


r 


A-6 


4U5 

965 

406 

C 

407 

C 

408 

C 

409 

c 

^10 

c 

411 

c 

412 

c 

*13 

c 

414 

c 

415 

c 

416 

417 

416 

419 

970 

420 

980 

421 

1000 

422 

423 

425 

426 

427 

428 

429 

430 

1010 

431 

c 

432 

c 

433 

c 

434 

c 

435 

1500 

436 

C 

447 

C 

448 

c 

449 

c 

440 

c 

441 

c 

442 

c 

443 

c 

444 

c 

445 

c 

446 

c 

447 

c 

448 

c 

449 

1900 

450 

2000 

451 

c 

452 

c 

453 

c 

454 

c 

455 

c 

456 

c 

457 

458 

459 

2020 

460 

461 

4*2 

463 

2050 

464 

2070 

465 

2090 

466 

2095 

467 

2i00 

468 

2110 

469 

2500 

470 

c 

471 

C 

472 

C 

473 

C 

BaCC^M  t BaCCIJI*  * 0 

NO"  THE  TASK  Is  TO  FlLu  THE  REMAINING  0's  IN  RQW  M'  EOR 
J s JPOlNT.JENDfl  fOR  j NOT  ESUAL  TO  'JOUT'  IA1.READY  OOMPUTEC. 
SKIP) , 

FIND  JA-INCEK  CF  array  'a'  CORRESPONDING  TO  THE  'JPOINT*  ELEMENT 
OF  AR"AY  'B.'  THEN  EXTRACT  NZ(JCATEG»-1  A-VAlUES  FROM  ARRAY, 
CORRECT  them  kith  0,  4ND  STORE  IN  ASCENDING  LOCATIONS  OF  'B.' 

skipping  the  'jout*  Element  already  treated. 

Ja  s 1 

IF  (wCATEG  ,E0,  1>  GO  TO  1000 
DO  980  JX  « l.jCATEG-l 
JA  = JA  * (NZ(JX)  - 1) 

continue 

J = JPCiNT 
JA  = JA  ♦ 1 

IF  (JA  ,GT,  NDIMA)  go  To  1500 
TEMPO  = A(  lA, JA  ) ♦ 0 
IF  (J  .EG>  JOUT)  J = J ♦ 1 
IF  (J  .GT.  JENCB)  go  to  1500 
0(1. J)  * TEMPO 
J s J • 1 

IF  (J  ,GT.  JENC0)  GO  TO  1500 
GO  TO  1010 

PROGRAM  BRANCHES  TO  HERE  “HEN  A J-CATEGORY  Is  COMPLETE.  This 
IS  TERMINATION  OF  J-LOOP. 

CONTINUE 

WHEN  ALL  J-CAtEGORIEs  FOR  A GIVEN  l-ROB  ARE  COMPLETE,  THE  l- 
ROW  advance  LOOP  TERMINATES  HERE.  NE*  I-RO«  IN  BLOCK  OF  I- 
ROWS  CONSTITUTING  PRESENT  I-CATECQRY. 

before  going  to  NEH  l-ROW,  SUPPL^  THE  ’B0»  TERM  FOR  THE  ROM 
BEING  TREATED.  THIS  IS  SUM  OF  LEFT-OOT  B'S  (iBACCUM<)  SUBTRACTED 
FROM  THE  'A'  CORRESPONDING  TO  THIS  I-ROH.  IN  OTHER  WORDS. 

ThE  0(1,1)  terms  aRe  EOUaL  TO  THE  MEaNS  QE  THE  aSSOcUTBO 
P*<EDIC^ANDS  (NOT  PREDICTORS).  MA'  HaS  ALTrEADV  BEEN  COM- 
PUTED AS  THE  I-INOEX  OF  'a'  CORRESPONDING  TO  THE  I-INDEX 
0)  'B.' 

B(l,l)  * A(1a.1)  - BaCCUM 
COnTinLE 

When  all  i-roRs  in  present  i-categorT^  are  complete,  i-loop  ter- 
minates here  for  a NEh  I-CATFGORY  (NEH  BLOCK  OF  I-ROWS>, 

FIRST  WE  MUST  TREAT  THE  IVOUT-TH  l-ROW  AS  NEGATIVE  QF  SUM  OF 
those  BEFORE  IT,  'ijO'  COLUMN  ADDS  TO  ONE.  OTHERS  TO  ZERO, 

lx  * IvOUT 

qO  2050  J « 1,NdI*^B 

B<lX,J)  » 0.0 

continue 

DO  2100  J « I.NDIMB 

00  20’0  I * IPOlNTi 1P01NT*NZ(ICATE6)-1 
IP  (1  ,E0.  IvOUT)  Go  TO  2090 
B(  IK,  J)  ■ 8(  IK.J)  * B( I ,j) 

CONTINUE 

B(  IX,  J)  E -B< IX> J) 

CONTINUE 

B(lK,l)  E 1.0  * BdX.l) 

CONTINUE 

WHEN  PROGRAM  CONTROL  ARRIVES  AT  THIs  POINT.  THE  B-MAtRIX  Ic  COM- 
PLETE AND  REACT  FOR  OUTPUT.  ALWAYS  PRINT  ThE  MATRIX.  OUTPUT 
TO  file  'IOUT'  is  optional.  DEPENDING  ON  OUTPUT  FtAG  'IFLGB.* 
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4/4 

475 

476 

477 

478 
4/9 
4t>0 
40l 
462 
4o3 

464 

465 
406 

467 

468 

469 
4y0 

492 

493 

494 

495 

496 
4 97 

498 

499 
5go 


IF  hRlTTEN.  'lOUf 
PROV'CEC  In  JCL, 


CAN  BE  ADCRESSED  TO  TaPE;  DISK  OR  CARDS*  AS 


write  t IPRINT.6090) 

6090  FOkWAT  (IHi.  «PL0DITE  OUTPUT  MATRIX  .///) 

WRITt  ( IPRINT.6o’’> 

6o95  format  (jH  I T4.  'I'*  T’*  'J*.  Ti5«  'B(J)'..  6(7Xi  'J'.  5X. 

. *0  ( J ) ' > } 

DO  3000  1 • l.^DlPb 
WRITE  < IPR»NT.6100)  I 
6100  FORMAT  (IhO*  T2,  13) 

WRITE  ( IPRlNTi^lo^)  (J#  a(l.J)i  J * 1<SDIH8) 

6105  format  (T5*  2X1  13*  IX*  Ell'4'  2X'  I3«  IX*  Ell*4'  2X*  13*  IX* 

• Ell. 4*  2X(  >3*  IX,  Ell. 4.  2X,  >3,  IX.  Ell. 4,  2X*  l3.  IX*  Ell. 4* 

• 2*.  13.  1*.  Ell, 4) 

3000  continue 

if  (IFuGB  ,LE.  0)  UO  TO  4000 
DO  3100  1 * 1*NDImB 

WRITE  IIOUT.6110)  |R(I*J)*  J s 1.NDIM0) 

6110  fORMaT  (5E15.8J 
3100  continue 

HEwiNC  lOUT 


termination. 


4000  sTuP 
ENU 


CROUT  PROGRAM 

C This  PROGRAM  PKOOUCfcS  RlGRESSIGN  COEFFICIENTS  OSlNG  THE  CRUUT 

c forwAro-packnaro  solution  method 
C 

OUUl  UIMENSIUN  SpRl  ^0) 

0002  REAL*a  A(  20.  20). B(  2C.  20). Y(  20.  201 

C 

C READ  N.  The  order  of  The  SOUARE  matrix  a.  NM.THE  NUMtJER  OF 
C PREOICTANDS  APPEARING  IN  THE  SSCP  PRE D 1 C TOR- PREO I C T A NO  MATRIX. 

C 

u003  READIS.GOO)  N.NM 

0004  SiOU  F0RMAT(2IJ) 

0005  NP=N+1 

0006  Nc=N-l 
C 

C READ  IN  THE  SSCP  PREDICTOR  MATRIX 
C 

G0U7  REA0I5.901)  ( (A ( I . J) . J=1 .N) . 1 -1 .N) 

C 

C READ  The  SSCP  PREDICTOR  - PREDICTANO  MATRIX. 

C 

OUOo  KEAD(5.901I  I ( Y ( I . J ) . J = 1 .N ) . I ° 1 . NM ) 

0009  901  FORMAT!  7F5.0) 

C 

C begin  calculating  The  CROUT  AUXILIARY  MATRIX  dY  GETTING  THE  FIRST  RUR 
C 

UOlO  DO  lu  1*2. N 

0011  10  A!  1.  I )<A(  I . 1) /A! 1 . 1) 

C 

C complete  the  CROUT  AUXILIARY  MATRIX 
C 


0012 

DO  20  J*2.N 

0013 

DO  3u  1«J.N 

u014 

JS= J-1 

G015 

DC  4u  L=1.JS 

0016 

A(l.j|>A(I.J)-A(l.L)*AIL.J) 

u017 

40 

CONTINUE 

OOIB 

IFI J .EO.l  ) GO  TO  30 

0019 

A( J. 1 )<AI 1 . J) /A! J. J) 

0020 

30 

CONTINUE 

002L 

20 

CONTINUE 

C 

C X REPRESENTS  THE  VARIABLE  FuR  WHICH  YOU  ARE  DERIEVING  COEFFICIENTS 
C 

0022  DO  500  K>1.NM 
C 

C AUGMENT  THE  CROUT  AUXILIARY  MATRIX  BY  MAKING  A LAST  RUM  wITH  THE 

c K th  row  of  the  sscp  predictcr-predictand  matrix. 

0023  DO  5C  M>1.N 

0024  50  A(NP.M)>Y(K .H) 

C 

C PLACE  the  1 ST  ELEMENT  OF  ThE  K TH  ROW  INTO  THE  LAST  DIAGONAL  ELEMENT 
C OF  THE  AUGMENTED  AUXILIARY 
C 

C025  AINP.NP)«Y(K, 1) 

C 

C START  TO  CALCULATE  THE  ADDITIONAL  ROW  AND  COLUMN 
C 

r 0020  A(1.NPI-A(NP.1)/Ai 1.1) 

c 

c complete  processing  the  aouitional  row  ano  column  AS  IN  20  LOOP  Above 


OU27 

OU  200  J>2.NP 

0020 

JS«J-1 

0029 

DO  400  L«1.JS 

0 030 

400 

AINP.J1>A(NP.  J)--A(NP.L)*AIL.JI 

0031 

IFIJ.EU.NPI  GO  TO  200 

0032 

A(J,NP)>AINP. J)/A(  J.J) 

0033 

200 

CONTINUE 

B-1 


c 

C ofcOlN  LALCULATING  TmE  COEF f I C t CM S FOR  THE  K TH  VARIABLE  Br  GETTING  THE 
C LAST  CUEfUCIENT 


c 

B(K,Nl*A(N,NPi 

00  35 

00  450  J°ltNL 

OOj* 

JJ=N-J 

0037 

B(K, Jj|=A« JO.NRI 

uCjb 

OU  300  L<1(J 

UOSV 

LL=NR-L 

OOhO 

30u 

b(K.>JJ)=btKiJj)-A(JJ,LL)*B(KtLL) 

0041 

450 

tCNT INUh 

C 

c calculate  The  REsIUUAL  sum  cf  scuakes 


0 04«; 

C 

SSK( K)>AINR,NPI/A( 1. 1) 

0043 

500 

C 

CONTINUE 

C UUTRUT  THE  COEFFICIENTS 
C 


004'* 

CU  600  K<1,NM 

0045 

NRlTt(6,902)K.SSR(K) 

0046 

902 

FOKMATi lH0tI6.E 14.6) 

0047 

MRlTE<6,903i  (Jfb(K.j|,J>l,N) 

0046 

903 

FURMATIM  15. L 15.6)1 

004V 

600 

CONTINUE 

0050 

STOP 
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Appendix  C 


REGRESSION  ESTIMATION  OF  EVENT  PROBABILITIES 
(REEP) 


C OESCP  IPX lOK 
C 

C PERFORMS  RtEP  ANALYSIS!  SEE  CHAPTER  5 J GIVEN  THE  CR CSSPRODUC T 
C MATRIX  AMONG  A SET  OF  PR  ED IC TORS ( Z / 1 AND  THE  CROSSPRODUCT  MATRIX 
C BETWEEN  A SET  OF  PREOICTANDS  ANL  PRE D IC TOR S( YZ ) . 

C SCREENING  IS  PERFORMED  AMONG  THE  PREDICTORS  TO  MAXIMIZE  THE  PREDICTABILITY 
C FOR  A GROUP  OF  PREOICTANDS.  ALL  OF  THE  PREOICTANDS  IN  A GROUP  ARE 
C DUMMY  VARIABLES.  USUALLY  MUTUALLY  EXCLUSIVE  AND  EXHAUSTIVE. 

C 

C note  that  The  ordinary  f test  is  not  apprpropriate  when  selecting 

C PREDICTORS  BY  A SCREENING  PROCEDURE.  IT  IS  RECOMMENDED  ThAT  THE  PRUBAHILITY 
C LEVEL  OK  The  test  BE  MADE  A FUNCTION  Of  THE  NUMBER  OF  POSSIBLE  PR  ED  I C TOR S ( I P I . 
C SPECIFICALLY,  (1-1/  (/0*lPn.  thus  AN  ORDINARY  VST,  LEVEL  WITHOUT  SCREENING 
C WOULD  BE  (l-l/.;OI.  SINCE  MOST  STATISTICAL  TABLES  SUCH  AS  HALO'S  DO  NOT  COVER 
C the  situation  of  a large  number  of  predictors,  it  is  SUGGESTED  that  the 
C INVERTED  PAULSON  APPROXIMATION  BE  USED  TO  ARRIVE  AT  THE  CRITICAL  F VALUE. 

C A factor  W is  USED  TO  APPROPRIATELY  REDUCE  THE  NUMBER  OF  SAMPLE  CASES 
C BECAUSE  OF  SERIAL  CORRELATION.  FOR  EXAMPLE,  IF  ONLY  EVERY  EIGTH  OBSERVATION 
C CAN  BE  CONSIDERED  INDEPENDENT,  SET  W ECUAL  TO  d. 


xji/ueArgp 


C WHERE  K I S The  number  of  STANDARD  DEVIATIONS  THE  ACCUMULATED  PROBABILITY  £ 
C IS  FORM  THE  MEAN  OF  THE  NORMAL  DISTRIBUTION. 


C INPUT 
C 


C 

CAROS  COL 

FMT 

NAME 

c 

1 

1-3 

13 

IP 

c 

4-6 

13 

MM 

c 

7-9 

13 

IG 

c 

c 

10-12 

13 

IGG 

c 

c 

2 

1-3 

I 3 

Nil, 

c 

c 

• 

4-S 

13 

NI2, 

c 

• 

c 

• 

c 

FOR 

example 

, SUPPOSE  1 

c 

20 

TO  26  IN 

The 

YZ  MA 

c 

IG 

CARDS  OF 

THIS 

TYPE 

c 

c 

IG*2  1-10 

F5.1 

FCRI 

c 

6-10 

FS.  1 

W 

DESCRIPTION 

number  of  predictors.  ZZ  is  an  ip  X IP  MATRIX 
NUMBER  OF  PREOICTANDS.  YZ  I S AN  MM  X IP  MATRIX 
NUMBER  OF  PREDICTAND  GROUPS.!  E.G.  CFILING, 
visibility,  pressure,  ETC) 

MAXIMUM  number  of  PREDICTAND  DUMMY  CATEGORIES 
IN  ANY  OF  The  IG  PREDiCTAND  GROUPS. 

I DEFINES  THE  START  POINT  FOR  EACH  OF  THE  IG  PREI 
GROUPS. 

) defines  the  end  POINT  FOR  EACH  OF  THE  IG  PREDIC 
GROUPS. 


CRITICAL  F VALUE. 

factor  that  IS  DIVIDED  INTO  THE  NUMBER  OF 


C OBSERVATIONS  TO  GET  THE  NUMBER  OF  INDEPENDENT  OBS. 

C 

C ZZ  AND  YZ  MUST  ALSO  BE  INPUT  AND  THIS  IS  DONE  VIA  SUBROUTINE  SCRNIP. 

C IN  the  SCRNIP  listing  PROVIDED,  ZZ  AND  YZ  ARE  READ  FROM  TAPE.  ZZ  IS  A 
C 131X  131  MATRIX  AND  YZ  IS  A 131  X 130  MATRIX  WHICH  MUST  BE  TRANSPOSED. 


c programmer 
c 


Or  ROBERT  G.  MILLER  AND  CAPT  MICHAEL  KELLY 


DlMLNSION  //(lil.lJl),YZ(131tl31)>b<2Stl32),Z(132.132).Y(25,132), 
•Nlz,22),lS(l32),TEMP(l32ltTkCI25)tTRP(25JtTKB<25»,BEST(l32)t 
•SSPiZS) 

DATA  TKCtTkP,T«B/75*0./ 

DATA  BEiT/l32*0./ 

REAOtb.SOO)  IP.MM.IG.IGG 

900  FQRMAT»A13) 

REAO(S,901i  ((N((.J|.I<1,ZJ,J<1,IG3 

901  FORMAT(213l 
RfcAD(5,902)  FCRIT.k 

902  FURMAT(2F5.n 

c scKNip  IS  AN  INPUT  Subroutine,  ip  and  mm  are  passed  to  define  the 

C SIZE  OF  LL  AND  YZ . THE  CROSS  PRODUCT  MATRICES  MENTIONED  ABOVE. 

CALL  SCRNIPI  IP,MM,ZZ,YZ) 

ITT.O 
NP«0 
ISI 1 )>1 

C restart  point  for  new  PREDICTAND  GROUP 
1000  it*o 

C L DEFINES  THE  PREDICTOR  BEING  CONSIDERED. 

L»1 

Ztl.  l)>2Z(i.l) 

C NP  indicates  THE  PREOICTAND  GROUP  YOU  ARE  WORKING  WITH 
NP»NP^1 

c KB  AND  KE  DEFINE  LIMITS  OF  THE  NP  PREDICTAND  GROUP 
KB=N( l,NP) 

KE>NI2,NP) 

KSZE«KE-KB+l 
DC  1100  K>KH,KE 
IT»1T*1 

Y(  IT,  ll.YZIM.l) 

C transform  TO  SUM  OF  SOUARES  OF  DEVIATIONS  FROM  THE  MEAN. 

TRPI  IT)-V«  IT,1)-Y(  IT,n*»2/Z(l,l) 

1100  CONTINUE 

C RESTART  POINT  FOR  SELECTING  NEXT  PREDICTOR 
2200  IT«1 
X«0. 

100  DO  SS  J«2,1P 

C TEST  TO  SEE  IF  PREDICTOR  WAS  PREVIOUSLY  SELECTED. 


DO  S 1*1, L 
IFIJ-lSini  5,S5,5 
S CONTINUE 

C LOAD  TEMP  with  POSSIBLE  PREDICTOR 
DO  10  I>1.L 

I K>IS(1I 

TEMPI  I )>ZZ(K, J) 

10  CONTINUE 

LT-L*1 

C LOAD  SUM  OF  SQUARES  OF  PREOICTCR  INTO  TEMP,  A WORK  SPACE. 

TEMPILTI-ZZIJ,J) 

C LOAD  Y 

DO  13  M«KB,KE 

ITT-ITTFl 

VI1TT,LT>«YZIM,J) 

ITlx  ISILI 

YUTT,L)-YZIM,I  Tl  I 
13  CONTINUE 

ITT-0 

C TEST  TO  DETERMINE  IF  FIRST  PREDICTOP 
IFiL-2)  27,2b,2B 


C-2 


1 


C AUl'HfcNT  TEMP 
28  LLT«LT-l 

CO  2C  LL»2,LLT 

LLN»LL-l 

DO  ^'3  LN=l,LLN 

TEMP (LL)-TEMP(LL|-TEMP(LNI*2(LN,LLI 
2t>  CONTINUE 

20  CONTINUE 

27  DO  22  LN«l,L 

TEMP(LT»*TEMP(LT)-TE«P(LNJ**2/2ILNtLM 
22  CONTINUE 

IT»Kb2E 

IF(L-2)  37i38,ld 
38  Du  30  M* I, IT 

DC  3&  LL=L,L 
KI=LL-1 
t &UC.MENT  Y 

DC  31  KK^I.M 

Y(M,LL)=Y(M,LL»-Y(M,KK)*2(KKtLL» 

31  CONTINUE 

31  CONTINUE 

30  CONTINUE 

37  DO  M=1,IT 

00  34  LL«l,L 

Y(M,LT»=Y(M,LT)-YlM,LL)*TEMPaL»/  2(LL,LL> 
34  CONTINUE 

32  CONTINUE 


DO  4u  M«1,IT 

c oiaoonal  test 

IF( TLMPILT t-l.£-6»  55,55,43 
43  TF«=Y(M,LT  I*42/TEMP  lOT  I 
TRCIM)«T(<P«M»-TF 
C TEST  FOh  MAX  ratio 

IF! TF/TRC (M»-X)  40,40,45 
45  X«TF/TPC(M| 

IJ-J 

DO  47  KK.1,LT 
C transfer  temp  into  3EST 
ttEST<KKI«TCMPIKK| 

47  CONTINUE 

DO  400  MM«1,IT 

TRb(MM)«TRP«MM) - Y ( MM , L T I **2/ T EMP ( L T I 
400  CONTINUE 
40  CONTINUE 

55  CONTINUE 

C DO  S IliMF  ICANCE  TEST 

IF(<X*Z2(  I,  Il/M-LTI-FCRITI  2000,2000,2100 
2100  ITT«0 
L-L*  1 

C PREDICTOR  IS  SIGNIFICANT-  INCORPORATE  INTO  Z. 
DO  550  XK«1,LT 

z(l,kk)*best(kki 

550  CONTINUE 

I lT-LT-1 
DO  420  M<.»1  ,I  IT 
Z(KK,Lt«8EST(KK)/Z(KK,KKI 
420  CONTINUE 

ISIL  l«IJ 
DO  450  M-l, IT 
TRP( M)»TRB(M» 

450  CONTINUE 

C TEST  TO  SEE  IF  LAST  PREDICTOR. 

IF(L-IPI  2200,2200,2000 


J 
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C PEKhuKM  HACK  SCjLUTIUN 

21,00 

LT=L*1 

DC  5000  K«ltKS2E 
C TACK  ADDITIUNAL  ROW  ON  I 
DC  501C  J=ltL 
2(LT  ,JJ’=V(K,J) 

5010  CONTINUE 
C TACK  AOOITIONAL  COL  ON  I 
00  5020  J*l.L 
2(J,LT  )»=Y(K,J)/2(JtJ) 
5«20  CONTINUE 


Z ( L T , L T ) »=  Y ( K , 1 ) 

C UtTERMINt  THE  EESIOUAL  SUM  OF  StUARES 
DO  5030  J=l,L 

Z(LT,LT)=Z(LT,LT)-ZILT,J1»Z(J,LT) 

5030  CONTINUE 

c uEoiN  calculatinu  the  coefficients 
NS*L  T 
NL*L- I 

b(K,L  ,NS) 

00  5040  J=ltNL 
J L - J 

aiK, jj)=z( JJ.NSi 
00  5050  I=ltJ 
LS=NS-I 

b(K,Jj)=b(K,jjt-Z(JJfLS)*U(KtLS) 

5050  CONTINUE 
5C<>0  CONTINUE 

C calculate  THE  RESIDUAL  VARIANCE 

SSR( KI>Z(NS tNSI /Z( 1 , 1 1 

5000  Continue 
c OUTPUT  the  coefficients 

«.RITEt6,50A)  NP 

904  FORMATIM  this  IS  PKEOICTANO  ORCUP  NUMBER  *,131 
DC  bCCO  K*1,KSZE 

xKITF (6f 905 ) K.SSRIK) 

905  FORMATCO  THE  RFSIOUAL  VARIANCE  FOR  THE  ',13, 

• • PRtOICTAND  IS  '.FSt.bl 

hRITCI6.90c)  K 

906  FORMAT! 'OBELOW  ARE  THE  PKECICTOR  NUMBERS  AND  COEFFICIENTS  FOR  • 
•.'THE  '.li,*  PREDICTANO'I 

MRITEl6«907l(J.IS(J)tB(K.JIfJ>lfLi 

907  FORMAT! 4( 2 I 4, E 15.6  1 1 
6000  CONTINUE 

C HAVE  YuU  DONE  ALL  PREOICTANO  GROUPS. 

lE(NP-IO)  1000.3000,3000 
3000  STOP 
ENO 

SUBFOUTINE  SCRNIP(IP,MM,2Z,YZI 
DIMENSION  ZZI  ni,131),YZ(131,l3U 
READ!  1.9601  ( IZZ( 1 ,J)  , J>1  ,IPI , I>1, IP) 

c transpose  of  yz  is  needed 

HEAD (2.900)  ( (YZ ( J.I ) , J>1  ,MM| ,1«1 , IP ) 

900  F0KMAT( 145F 7.0) 

WR1TE(6,901) 

901  FORMAT! • I BELCw  IS  THE  II  MATRIX* I 

WRITE (6,902)  ((I  .J.ZZ((,J),J>i,IP),I>l,IP) 

902  F0kMATI5(2l4,FlC.0)/) 

WRITE(6,903) 

903  FORMAT! *1  below  IS  THE  YZ  MATRIX*) 

WRITE  1 6, 90*.)  (!  I ,J,YZ!  I,J),J-1,IP),I«1,MM) 

RETURN 

END 


•v  I M>in«T  wnwmt  offia:  \%it-ttiinm  aicioi  no.  < 


